In R, the function outer structurally allows you to take the outer product of two vectors x and y while providing a number of options for the actual function applied to each combination. For example outer(x,y,'-') creates an "outer product" matrix of the elementwise differences between x and y. Does Julia have something similar?
Broadcast is the Julia operation which occurs when adding .'s around. When the two containers have the same size, it's an element-wise operation. Example: x.*y is element-wise if size(x)==size(y). However, when the shapes don't match, then broadcast really comes into effect. If one of them is a row vector and one of them is a column vector, then the output will be 2D with out[i,j] matching the ith row of the column vector with the j row vector. This means x .* y is a peculiar way to write the outer product if one a row and the other is a column vector.
In general, what broadcast is doing is:
This is wasteful when dimensions get large, so Julia offers broadcast(), which expands singleton dimensions in array arguments to match the corresponding dimension in the other array without using extra memory
(This is from the Julia Manual)
But this generalizes to all of the other binary operators, so x .- y' is what you're looking for.
Related
I found this previous thread here:
Range standardization (0 to 1) in R
This leads me to my question: when building a function to perform a calculation across all values
in a vector, my understanding was that this scenario is when the use of for-loops would be necessary (because said calculation is being all applied to all vector values). However, apparently that is not the case. What am I misunderstanding?
All of the basic arithmetic operations in R, and most of the basic numerical/mathematical functions, are natively vectorized; that is, they operate position-by-position on elements of vectors (or pairs of elements from matched vectors). So for example if you compute
c(1,3,5) + c(2,4,7)
You don't need an explicit for loop (although there is one in the underlying C code!) to get c(3,7,12) as the answer.
In addition, R has vector recycling rules; any time you call an operation with two vectors, the shorter automatically gets recycled to match the length of the longer one. R doesn't have scalar types, so a single number is stored as a numeric vector of length 1. So if we compute
(x-min(x))/(max(x)-min(x))
max(x) and min(x) are both of length 1, so the denominator is also of length 1. Subtracting min(x) from x, min(x) gets replicated/recycled to a vector the same length as x, and then the pairwise subtraction is done. Finally, the numerator (length == length(x)) is divided by the denominator (length 1) following similar rules.
In general, exploiting this built-in vectorization in R is much faster than writing your own for-loop, so it's definitely worth trying to get the hang of when operations can be vectorized.
This kind of vectorization is implemented in many higher-level languages/libraries that are specialized for numerical computation (pandas and numpy in Python, Matlab, Julia, ...)
I have some code which I call with two vectors of different length, lets call them A and B. However, I wrote the function having in mind a single element of A with the expectation that it will be automatically threaded over A. To be concrete,
A <- rnorm(5)
B <- rnorm(30)
foo <- function(x,B){
sum( cos(x*B) ) # calculate sum_i cos(x*B[i])
}
sum( exp(foo(A,B)) ) # expecting this to calculate the exponent for each A[j] and add over j
I need to get
Σ_j exp( Σ_i cos(A[j]*B[i])
and not
Σ_ij exp(cos(A[j]*B[i])) OR exp(cos(Σ_ij A[j]*B[i]))
I suspect that the last R expression is ambiguous, since the declaration of foo does not know B is always a vector. What are the formal rules and am I right to worry about the ambiguity?
If we want to loop over the 'A', then use sapply , and apply the foo on each of the elements of 'A' with anonymous function call and get the sum of the output vector
sum(exp(sapply(A, function(x) foo(x, B))))
In the OP's example with the expression foo(A, B), the product A*B is computed first, and since the lengths of A and B are unequal, the recycling rule takes priority. There is no error message coming out, just because by pure luck the vector length of one is a multiple of the other.
You can also Vectorize the x input. I think this is what you were expecting. At the end of the day, this will work it's way down to an mappy() implementation which is a multivariate sapply, so probably best to just do it yourself as with the solution from akrun.
foo2 <- Vectorize(foo, "x")
sum(exp(foo2(A, B)))
The "formal rules" as you put them is quite simply how R does help("Arithmetic").
The binary operators return vectors containing the result of the element by element operations. If involving a zero-length vector the result has length zero. Otherwise, the elements of shorter vectors are recycled as necessary (with a warning when they are recycled only fractionally). The operators are + for addition, - for subtraction, * for multiplication, / for division and ^ for exponentiation.
So when you use x*B, it is doing element-wise multiplication. Nothing changes when you pass A into the function instead of x.
Simply go through your lines one at a time.
x*B will be a vector of length max(length(x, B)). When they are not of the same length, R will recycle elements of the shorter vector (i.e., repeat them).
cos(x*B) will be a vector of the same length as step (1), but now the cosine of that value.
sum( cos(x*B) ) will sum that vector, returning a single number.
foo(A,B) does steps (1) through (3), but with your defined A and B. Note that in your example A is recycled 6 times to get to the length of B. In other words, what you entered as A is being used as rep(A, 6) in the multiplication step. Nothing about a function definition in R says that foo(A,B) should be repeated for each element of vector A. So it behaves literally as you wrote it, basically swapping in A for x in the function code.
exp(foo(A,B)) will take the result from foo from step 3 (which is a scalar) and raise it to an exponent.
sum( exp(foo(A,B)) ) does nothing, since step (5) is a scalar, there is nothing to sum.
I've encountered a problem when trying to iterate through two dimension array and summing up the lengths of all elements inside in prolog.
I've tried iterating through a simple 1D array and result was just as expected. However, difficulties appeared when I started writing the code for 2D array. Here's my code :
findsum(L):-
atom_row(L, Sum),
write(Sum).
atom_row([Head|Tail], Sum) :-
atom_lengths(Head, Sum),
atom_row(Tail, Sum).
atom_row([], 0).
atom_lengths([Head|Tail], Sum):-
atom_chars(Head, CharList),
length(CharList, ThisLenght),
atom_lengths(Tail, Temp),
Sum is Temp + ThisLenght,
write(ThisLenght).
atom_lengths([], 0).
For example, sum of the elements in array [[aaa, bbbb], [ccccc, dddddd]] should be equal to 18. And this is what I get:
?- findsum([[aaa, bbbb], [ccccc, dddddd]]).
436
false.
The output comes from write(ThisLength) line after each iteration.
Typically it helps (a lot) by splitting the problem into simpeler sub-problems. We can solve the problem, for example, with the following three steps:
first we concatenate the list of lists into a single one-dimension list, for example with append/2;
next we map each atom in that list to the length of that atom, with the atom_length/2 predicate; and
finally we sum up these values, for example with sum_list/2.
So the main predicate looks like:
findsum(LL, S) :-
append(LL, L),
maplist(atom_length, L, NL),
sumlist(NL, S).
Since maplist/3 is a predicate defined in the library(apply), we thus don't need to implement any other predicates.
Note: You can see the implementions of the linked predicates by clicking on the :- icon.
For example:
?- findsum([[aaa, bbbb], [ccccc, dddddd]], N).
N = 18.
I have the following mathematical formula that I want to program as efficiently as possible in R.
$\sum_{i=1}^{N}(x_i-\bar x)(y_i-\bar y)$
Let's say we have the following example data:
x = c(1,5,7,10,11)
y = c(2,4,8,9,12)
How can I easily get this sum with this data without making a separate function?
Isn't there a package or a function that can compute these mathematical sums?
Use the sum command and vectorized operations: sum((x-mean(x))*(y-mean(y)))
The key revelation here is that the sum function is just taking the sum over the argument (vector, matrix, whatever). In this case, it's sufficient to give it a vector, and in this case, the vector expression is a little more complicated than sum(z), but notice that (x-mean(x))*(y-mean(y)) evaluates to z, so the fact that the command is slightly ornate doesn't really change how the function works. This is true in many places, not just the sum command.
This is what my code looks like. a, b, c, and d are scalers, e is a list of vectors. A, B, C and D are vectors.
GetOutput=function(a,b,c,d){
e=FunOther(a,b,c,d)
i=mean(e$f)
j=mean(e$g)
k=abs(mean(e$h))
return(list(b=b,i=i,j=j,k=k))
}
Output=mapply(GetOutput,A,B,C,D)
GetOutput will return a list of 4 scalers. I want to factor this up to a matrix of inputs and a matrix of outputs. I had been using a for loop but I thought I would try mapply instead.
Suppose A, B, C and D have a length 100. I just want to get a vector with length 100 which give me all of the i's so that I can calculate their minima. Then the same for the j's and k's. This is part of a Monte Carlo study. But I am having trouble understanding the Output object. It appears to be a list of lists. What I thought would be a one liner turns into several operations. The best I can come up with is:
Output2=as.data.frame(t(Output))
OutputMeans=c(mean(as.numeric(Output2$i)),
mean(as.numeric(Output2$j)),
mean(as.numeric(Output2$k)))
This seems just bananas to me. I though I could operate on Output directly with the mean function without having to bother with all of these transformations.
If you had instead written: return( c(b=b,i=i,j=j,k=k) ) , then each element in the list from mapply would have been a named vector, rather than what you did get .... a list of lists. And since the 'simplify' argument would have let you return a matrix, you could have returned a non-recursive result as well. Because mapply is so versatile, it gives you multiple levels of control of the returned structure.
Another R programming tip: Don't use '$' as an extraction function inside functions. If you are sure of your column name you can use `[['colname']] but then by using '[[' you can later generalize your function to accept column names as arguments, a feature which '$' will not support.