use a julia iterator as a regular vector - vector

I was quite puzzled by the following,
sqrt(1:3) * [1 2 3]
# 3x3 Matrix, as expected
sqrt(1:3) * 1:3
# error `colon` has no method matching...
until I realised that 1:3 must be a different kind of beast, i.e not just a vector as I expected from Matlab. My current workaround is to use hcat to convert it to a vector, sqrt(1:3) * hcat(1:3...), is there a better approach?

The main problem with the second version
sqrt(1:3) * 1:3
is actually operator precedence. The colon operator is very low precedence, so this translates to
(sqrt(1:3) * 1):3
which is nonsensical, hence the error
ERROR: `colon` has no method matching colon(::Array{Float64,1}, ::Int64)`
Having said that, if you "fix it" with parentheses it doesn't work because the operator isn't defined. Hence you probably want sqrt(1:3) * [1:3]'.

typeof(1:3) gives UnitRange{Int64} (constructor with 1 method), whereas typeof([1:3]) gives: Array{Int64,1}. Note that [1:3] is by default a column vector, so you need to transpose it: sqrt(1:3) * [1:3].'

Related

What are the rules for threading a function over a vector in R?

I have some code which I call with two vectors of different length, lets call them A and B. However, I wrote the function having in mind a single element of A with the expectation that it will be automatically threaded over A. To be concrete,
A <- rnorm(5)
B <- rnorm(30)
foo <- function(x,B){
sum( cos(x*B) ) # calculate sum_i cos(x*B[i])
}
sum( exp(foo(A,B)) ) # expecting this to calculate the exponent for each A[j] and add over j
I need to get
Σ_j exp( Σ_i cos(A[j]*B[i])
and not
Σ_ij exp(cos(A[j]*B[i])) OR exp(cos(Σ_ij A[j]*B[i]))
I suspect that the last R expression is ambiguous, since the declaration of foo does not know B is always a vector. What are the formal rules and am I right to worry about the ambiguity?
If we want to loop over the 'A', then use sapply , and apply the foo on each of the elements of 'A' with anonymous function call and get the sum of the output vector
sum(exp(sapply(A, function(x) foo(x, B))))
In the OP's example with the expression foo(A, B), the product A*B is computed first, and since the lengths of A and B are unequal, the recycling rule takes priority. There is no error message coming out, just because by pure luck the vector length of one is a multiple of the other.
You can also Vectorize the x input. I think this is what you were expecting. At the end of the day, this will work it's way down to an mappy() implementation which is a multivariate sapply, so probably best to just do it yourself as with the solution from akrun.
foo2 <- Vectorize(foo, "x")
sum(exp(foo2(A, B)))
The "formal rules" as you put them is quite simply how R does help("Arithmetic").
The binary operators return vectors containing the result of the element by element operations. If involving a zero-length vector the result has length zero. Otherwise, the elements of shorter vectors are recycled as necessary (with a warning when they are recycled only fractionally). The operators are + for addition, - for subtraction, * for multiplication, / for division and ^ for exponentiation.
So when you use x*B, it is doing element-wise multiplication. Nothing changes when you pass A into the function instead of x.
Simply go through your lines one at a time.
x*B will be a vector of length max(length(x, B)). When they are not of the same length, R will recycle elements of the shorter vector (i.e., repeat them).
cos(x*B) will be a vector of the same length as step (1), but now the cosine of that value.
sum( cos(x*B) ) will sum that vector, returning a single number.
foo(A,B) does steps (1) through (3), but with your defined A and B. Note that in your example A is recycled 6 times to get to the length of B. In other words, what you entered as A is being used as rep(A, 6) in the multiplication step. Nothing about a function definition in R says that foo(A,B) should be repeated for each element of vector A. So it behaves literally as you wrote it, basically swapping in A for x in the function code.
exp(foo(A,B)) will take the result from foo from step 3 (which is a scalar) and raise it to an exponent.
sum( exp(foo(A,B)) ) does nothing, since step (5) is a scalar, there is nothing to sum.

Any logical test to distinguish between make-up of numerical objects

I was wondering if there a way for R to detect the existence or absence of the sign * as used in the following objects?
In other words, can R understand that a has a * sign but b doesn't?
a = 3*4
b = 12
If you keep the expressions unevaluated, R can understand their internal complexity. Under normal circumstances, though, R evaluates expressions immediately, so there is no way to tell the difference between a <- 3*4 and b <- 12 once the assignments have been made. That means that the answer to your specific question is No.
Dealing with unevaluated expressions can get a bit complex, but quote() is one simple way to keep e.g. 3*4 from being evaluated:
> length(quote(3*4))
[1] 3
> length(quote(12))
[1] 1
If you're working inside a function, you can use substitute to retrieve the unevaluated form of the function arguments:
> f <- function(a) {
+ length(substitute(a))
+ }
> f(12)
[1] 1
> f(3*4)
[1] 3
In case you're pursuing this farther, you should be aware that counting complexity might not be as easy as you think:
> f(sqrt(2*3+(7*19)^2))
[1] 2
What's going on is that R stores expressions as a tree; the top level here is made up of sqrt and <the rest of the expression>, which has length 2. If you want to measure complexity you'll need to do some kind of collapsing or counting down the branches of the tree ...
Furthermore, if you first assign a <- 3*4 and then call f(a) you get 1, not 3, because substitute() gives you back just the symbol a, which has length 1 ... the information about the difference between "12" and "3*4" gets lost as soon as the expression is evaluated, which happens when the value is assigned to the symbol a. The bottom line is that you have to be very careful in controlling when expressions get evaluated, and it's not easy.
Hadley Wickham's chapter on expressions might be a good place to read more.

expression vs call

What is the difference between an expression and a call?
For instance:
func <- expression(2*x*y + x^2)
funcDx <- D(func, 'x')
Then:
> class(func)
[1] "expression"
> class(funcDx)
[1] "call"
Calling eval with envir list works on both of them. But Im curious what is the difference between the two class, and under what circumstances should I use expression or call.
You should use expression when you want its capacity to hold more than one expression or call. It really returns an "expression list". The usual situation for the casual user of R is in forming arguments to ploting functions where the task is forming symbolic expressions for labels. R expression-lists are lists with potentially many items, while calls never are such. It's interesting that #hadley's Advanced R Programming suggests "you'll never need to use [the expression function]": http://adv-r.had.co.nz/Expressions.html. Parenthetically, the bquote function is highly useful, but has the limitation that it does not act on more than one expression at a time. I recently hacked a response to such a problem about parsing expressions and got the check, but I thought #mnel's answer was better: R selectively style plot axis labels
The strategy of passing an expression to the evaluator with eval( expr, envir= < a named environment or list>) is essentially another route to what function is doing. A big difference between expression and call (the functions) is that the latter expects a character object and will evaluate it by looking for a named function in the symbol table.
When you say that processing both with the eval "works", you are not saying it produces the same results, right? The D function (call) has additional arguments that get substituted and restrict and modify the result. On the other hand evaluation of the expression-object substitutes the values into the symbols.
There seem to be "levels of evaluation":
expression(mean(1:10))
# expression(mean(1:10))
call("mean" , (1:10))
# mean(1:10)
eval(expression(mean(1:10)))
# [1] 5.5
eval(call("mean" , (1:10)))
# [1] 5.5
One might have expected eval(expression(mean(1:10))) to return just the next level of returning a call object but it continues to parse the expression tree and evaluate the results. In order to get just the unevaluated function call to mean, I needed to insert a quote:
eval(expression(quote(mean(1:10))))
# mean(1:10)
From the documentation (?expression):
...an R expression vector is a list of calls, symbols etc, for example as returned by parse.
Notice:
R> class(func[[1]])
[1] "call"
When given an expression, D acts on the first call. If func were simply a call, D would work the same.
R> func2 <- substitute(2 * x * y + x^2)
R> class(func2)
[1] "call"
R> D(func2, 'x')
2 * y + 2 * x
Sometimes for the sake of consistency, you might need to treat both as expressions.
in this case as.expression comes in handy:
func <- expression(2*x*y + x^2)
funcDx <- as.expression(D(func, 'x'))
> class(func)
[1] "expression"
> class(funcDx)
[1] "expression"

vectorize a bidimensional function in R

I have a some true and predicted labels
truth <- factor(c("+","+","-","+","+","-","-","-","-","-"))
pred <- factor(c("+","+","-","-","+","+","-","-","+","-"))
and I would like to build the confusion matrix.
I have a function that works on unary elements
f <- function(x,y){ sum(y==pred[truth == x])}
however, when I apply it to the outer product, to build the matrix, R seems unhappy.
outer(levels(truth), levels(truth), f)
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]
What is the recommended strategy for this in R ?
I can always go through higher order stuff, but that seems clumsy.
I sometimes fail to understand where outer goes wrong, too. For this task I would have used the table function:
> table(truth,pred) # arguably a lot less clumsy than your effort.
pred
truth - +
- 4 2
+ 1 3
In this case, you are test whether a multivalued vector is "==" to a scalar.
outer assumes that the function passed to FUN can take vector arguments and work properly with them. If m and n are the lengths of the two vectors passed to outer, it will first create two vectors of length m*n such that every combination of inputs occurs, and pass these as the two new vectors to FUN. To this, outer expects, that FUN will return another vector of length m*n
The function described in your example doesn't really do this. In fact, it doesn't handle vectors correctly at all.
One way is to define another function that can handle vector inputs properly, or alternatively, if your program actually requires a simple matching, you could use table() as in #DWin 's answer
If you're redefining your function, outer is expecting a function that will be run for inputs:
f(c("+","+","-","-"), c("+","-","+","-"))
and per your example, ought to return,
c(3,1,2,4)
There is also the small matter of decoding the actual meaning of the error:
Again, if m and n are the lengths of the two vectors passed to outer, it will first create a vector of length m*n, and then reshapes it using (basically)
dim(output) = c(m,n)
This is the line that gives an error, because outer is trying to shape the output into a 2x2 matrix (total 2*2 = 4 items) while the function f, assuming no vectorization, has given only 1 output. Hence,
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]

Why are arguments to replacement functions not evaluated lazily?

Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.

Resources