How does one perform the exp() operation element-wise in Juila? - julia

I'm new to Julia and this seems like a straight-forward operation but for some reason I am not finding the answer anywhere.
I have been going through some tutorials online and they simply use exp(A) where A is a nxm matrix but this gives me a DimensionMismatch error.
I looked through the documentation on the official website in the elementary functions as well as the linear algebra section and googled it multiple times but can't find it for the life of me.

In julia, operations on matrices treat the matrix as an object rather than a collection of numbers. As such exp(A) tries to perform the matrix exponential which is only defined for square matrices. To get element-wise operations on matrices, you use broadcasting which is done with the dot operator. Thus here, you want exp.(A).
This design is used because it allows any scalar operation to be done on arrays
rather than just the ones built in to the language.

The broadcasting operator . always changes a function to "element-wise". Therefore the answer is exp.(A), just like sin.(A), cos.(A), or f.(A) for any user-defined f.

In addition to the above answer, one might also wish to consider the broadcast operator with function piping:
A = rand(-10:10, 3, 3)
A .|> sin .|> inv

Related

Memory Efficient Centered Sparse SVD/PCA (in Julia)?

I have a 3 million x 9 million sparse matrix with several billion non-zero entries. R and Python do not allow sparse matrices with more than MAXINT non-zero entries, thus why I found myself using Julia.
While scaling this data with the standard deviation is trivial, demeaning is of course a no-go in a naive manner as that would create a dense, 200+ terabyte matrix.
The relevant code for doing svd is julia can be found at https://github.com/JuliaLang/julia/blob/343b7f56fcc84b20cd1a9566fd548130bb883505/base/linalg/arnoldi.jl#L398
From my reading, a key element of this code is the AtA_or_AAt struct and several of the functions around those, specifically A_mul_B!. Copied below for your convenience
struct AtA_or_AAt{T,S} <: AbstractArray{T, 2}
A::S
buffer::Vector{T}
end
function AtA_or_AAt(A::AbstractMatrix{T}) where T
Tnew = typeof(zero(T)/sqrt(one(T)))
Anew = convert(AbstractMatrix{Tnew}, A)
AtA_or_AAt{Tnew,typeof(Anew)}(Anew, Vector{Tnew}(max(size(A)...)))
end
function A_mul_B!(y::StridedVector{T}, A::AtA_or_AAt{T}, x::StridedVector{T}) where T
if size(A.A, 1) >= size(A.A, 2)
A_mul_B!(A.buffer, A.A, x)
return Ac_mul_B!(y, A.A, A.buffer)
else
Ac_mul_B!(A.buffer, A.A, x)
return A_mul_B!(y, A.A, A.buffer)
end
end
size(A::AtA_or_AAt) = ntuple(i -> min(size(A.A)...), Val(2))
ishermitian(s::AtA_or_AAt) = true
This is passed into the eigs function, where some magic happens, and the output is then processed in to the relevant components for SVD.
I think the best way to make this work for a 'centering on the fly' type setup is to do something like subclass AtA_or_AAT with a AtA_or_AAT_centered version that more or less mimics the behavior but also stores the column means, and redefines the A_mul_B! function appropriately.
However, I do not use Julia very much and have run in to some difficulty modifying things already. Before I try to dive into this again, I was wondering if I could get feedback if this would be considered an appropriate plan of attack, or if there is simply a much easier way of doing SVD on such a large matrix (I haven't seen it, but I may have missed something).
edit: Instead of modifying base Julia, I've tried writing a "Centered Sparse Matrix" package that keeps the sparsity structure of the input sparse matrix, but enters the column means where appropriate in various computations. It's limited in what it has implemented, and it works. Unfortunately, it is still too slow, despite some pretty extensive efforts to try to optimize things.
After much fuddling with the sparse matrix algorithm, I realized that distributing the multiplication over the subtraction was dramatically more efficient:
If our centered matrix Ac is formed from the original nxm matrix A and its vector of column means M, with a nx1 vector of ones that I will just call 1. We are multiplying by a mxk matrix X
Ac := (A - 1M')
AcX = X
= AX - 1M'X
And we are basically done. Stupidly simple, actually.
AX is can be carried out with the usual sparse matrix multiplication function, M'X is a dense vector-matrix inner product, and the vector of 1's "broadcasts" (to use Julia's terminology) to each row of the AX intermediate result. Most languages have a way of doing that broadcasting without realizing the extra memory allocation.
This is what I've implemented in my package for AcX and Ac'X. The resulting object can then be passed to algorithms, such as the svds function, which only depend on matrix multiplication and transpose multiplication.

Real part of complex number?

I'm just learning R, so please forgive what I'm sure is a very elementary question. How do I take the real part of a complex number?
If you read the help file for complex (?complex), you will see a number of functions for performing complex arithmetic. The details clearly state
The functions Re, Im, Mod, Arg and Conj have their usual interpretation as returning the real part, imaginary part, modulus, argument and complex conjugate for complex values.
Therefore
Use Re:
Re(1+2i)
# 1
For the imaginary part:
Im(1+2i)
# 2
?complex will list other useful functions.
Re(z)
and
Im(z)
would be the functions you're looking for.
See also http://www.johnmyleswhite.com/notebook/2009/12/18/using-complex-numbers-in-r/

How to vectorize equations?

I'm trying to implement the Softmax regression algorithm to solve the K-classifier problem after watching Professor Andrew Ng's lectures on GLM. I thought I understood everything he was saying until it finally came to writing the code to implement the cost function for Softmax regression, which is as follows:
The problem I am having is trying to figure out a way to vectorize this. Again I thought I understood how to go about vectorizing equations like this since I was able to do it for linear and logistic regression, but after looking at that formula I am stuck.
While I would love to figure out a vectorized solution for this (I realize there is a similar question posted already: Vectorized Implementation of Softmax Regression), what I am more interested in is whether any of you can tell me a way (your way) to methodically convert equations like this into vectorized forms. For example, for those of you who are experts or seasoned veterans in ML, when you read of new algorithms in the literature for the first time, and see them written in similar notation to the equation above, how do you go about converting them to vectorized forms?
I realize I might be coming off as being like the student who is asking Mozart, "How do you play the piano so well?" But my question is simply motivated from a desire to become better at this material, and assuming that not everyone was born knowing how to vectorize equations, and so someone out there must have devised their own system, and if so, please share! Many thanks in advance!
Cheers
This one looks pretty hard to vectorize since you are doing exponentials inside of your summations. I assume you are raising e to arbitrary powers. What you can vectorize is the second term of the expression \sum \sum theta ^2 just make sure to use .* operator in matlab enter link description here to computer \theta ^2
Same goes for the inner terms of the ratio of the that goes into the logarithm. \theta ' x^(i) is vectorizable expression.
You might also benefit from a memoization or dynamic programming technique and try to reuse the results of computations of e^\theta' x^(i).
Generally in my experience the way to vectorize is first to get non-vectorized implementation working. Then try to vectorize the most obvious parts of your computation. At every step tweak your function very little and always check if you get the same result as non-vectorized computation. Also, having multiple test cases is very helpful.
The help files that come with Octave have this entry:
19.1 Basic Vectorization
To a very good first approximation, the goal in vectorization is to
write code that avoids loops and uses whole-array operations. As a
trivial example, consider
for i = 1:n
for j = 1:m
c(i,j) = a(i,j) + b(i,j);
endfor
endfor
compared to the much simpler
c = a + b;
This isn't merely easier to write; it is also internally much easier to
optimize. Octave delegates this operation to an underlying
implementation which, among other optimizations, may use special vector
hardware instructions or could conceivably even perform the additions in
parallel. In general, if the code is vectorized, the underlying
implementation has more freedom about the assumptions it can make in
order to achieve faster execution.
This is especially important for loops with "cheap" bodies. Often it
suffices to vectorize just the innermost loop to get acceptable
performance. A general rule of thumb is that the "order" of the
vectorized body should be greater or equal to the "order" of the
enclosing loop.
As a less trivial example, instead of
for i = 1:n-1
a(i) = b(i+1) - b(i);
endfor
write
a = b(2:n) - b(1:n-1);
This shows an important general concept about using arrays for
indexing instead of looping over an index variable.  Index Expressions.
Also use boolean indexing generously. If a condition
needs to be tested, this condition can also be written as a boolean
index. For instance, instead of
for i = 1:n
if (a(i) > 5)
a(i) -= 20
endif
endfor
write
a(a>5) -= 20;
which exploits the fact that 'a > 5' produces a boolean index.
Use elementwise vector operators whenever possible to avoid looping
(operators like '.*' and '.^').  Arithmetic Ops. For simple
inline functions, the 'vectorize' function can do this automatically.
-- Built-in Function: vectorize (FUN)
Create a vectorized version of the inline function FUN by replacing
all occurrences of '', '/', etc., with '.', './', etc.
This may be useful, for example, when using inline functions with
numerical integration or optimization where a vector-valued
function is expected.
fcn = vectorize (inline ("x^2 - 1"))
=> fcn = f(x) = x.^2 - 1
quadv (fcn, 0, 3)
=> 6
See also:  inline,  formula,
 argnames.
Also exploit broadcasting in these elementwise operators both to
avoid looping and unnecessary intermediate memory allocations.
 Broadcasting.
Use built-in and library functions if possible. Built-in and
compiled functions are very fast. Even with an m-file library function,
chances are good that it is already optimized, or will be optimized more
in a future release.
For instance, even better than
a = b(2:n) - b(1:n-1);
is
a = diff (b);
Most Octave functions are written with vector and array arguments in
mind. If you find yourself writing a loop with a very simple operation,
chances are that such a function already exists. The following
functions occur frequently in vectorized code:
Index manipulation
* find
* sub2ind
* ind2sub
* sort
* unique
* lookup
* ifelse / merge
Repetition
* repmat
* repelems
Vectorized arithmetic
* sum
* prod
* cumsum
* cumprod
* sumsq
* diff
* dot
* cummax
* cummin
Shape of higher dimensional arrays
* reshape
* resize
* permute
* squeeze
* deal
Also look at these pages from a Stanford ML wiki for some more guidance with examples.
http://ufldl.stanford.edu/wiki/index.php/Vectorization
http://ufldl.stanford.edu/wiki/index.php/Logistic_Regression_Vectorization_Example
http://ufldl.stanford.edu/wiki/index.php/Neural_Network_Vectorization

algorithm to find derivative

I'm writing program in Python and I need to find the derivative of a function (a function expressed as string).
For example: x^2+3*x
Its derivative is: 2*x+3
Are there any scripts available, or is there something helpful you can tell me?
If you are limited to polynomials (which appears to be the case), there would basically be three steps:
Parse the input string into a list of coefficients to x^n
Take that list of coefficients and convert them into a new list of coefficients according to the rules for deriving a polynomial.
Take the list of coefficients for the derivative and create a nice string describing the derivative polynomial function.
If you need to handle polynomials like a*x^15125 + x^2 + c, using a dict for the list of coefficients may make sense, but require a little more attention when doing the iterations through this list.
sympy does it well.
You may find what you are looking for in the answers already provided. I, however, would like to give a short explanation on how to compute symbolic derivatives.
The business is based on operator overloading and the chain rule of derivatives. For instance, the derivative of v^n is n*v^(n-1)dv/dx, right? So, if you have v=3*x and n=3, what would the derivative be? The answer: if f(x)=(3*x)^3, then the derivative is:
f'(x)=3*(3*x)^2*(d/dx(3*x))=3*(3*x)^2*(3)=3^4*x^2
The chain rule allows you to "chain" the operation: each individual derivative is simple, and you just "chain" the complexity. Another example, the derivative of u*v is v*du/dx+u*dv/dx, right? If you get a complicated function, you just chain it, say:
d/dx(x^3*sin(x))
u=x^3; v=sin(x)
du/dx=3*x^2; dv/dx=cos(x)
d/dx=v*du+u*dv
As you can see, differentiation is only a chain of simple operations.
Now, operator overloading.
If you can write a parser (try Pyparsing) then you can request it to evaluate both the function and derivative! I've done this (using Flex/Bison) just for fun, and it is quite powerful. For you to get the idea, the derivative is computed recursively by overloading the corresponding operator, and recursively applying the chain rule, so the evaluation of "*" would correspond to u*v for function value and u*der(v)+v*der(u) for derivative value (try it in C++, it is also fun).
So there you go, I know you don't mean to write your own parser - by all means use existing code (visit www.autodiff.org for automatic differentiation of Fortran and C/C++ code). But it is always interesting to know how this stuff works.
Cheers,
Juan
Better late than never?
I've always done symbolic differentiation in whatever language by working with a parse tree.
But I also recently became aware of another method using complex numbers.
The parse tree approach consists of translating the following tiny Lisp code into whatever language you like:
(defun diff (s x)(cond
((eq s x) 1)
((atom s) 0)
((or (eq (car s) '+)(eq (car s) '-))(list (car s)
(diff (cadr s) x)
(diff (caddr s) x)
))
; ... and so on for multiplication, division, and basic functions
))
and following it with an appropriate simplifier, so you get rid of additions of 0, multiplying by 1, etc.
But the complex method, while completely numeric, has a certain magical quality. Instead of programming your computation F in double precision, do it in double precision complex.
Then, if you need the derivative of the computation with respect to variable X, set the imaginary part of X to a very small number h, like 1e-100.
Then do the calculation and get the result R.
Now real(R) is the result you would normally get, and imag(R)/h = dF/dX
to very high accuracy!
How does it work? Take the case of multiplying complex numbers:
(a+bi)(c+di) = ac + i(ad+bc) - bd
Now suppose the imaginary parts are all zero, except we want the derivative with respect to a.
We set b to a very small number h. Now what do we get?
(a+hi)(c) = ac + hci
So the real part of this is ac, as you would expect, and the imaginary part, divided by h, is c, which is the derivative of ac with respect to a.
The same sort of reasoning seems to apply to all the differentiation rules.
Symbolic Differentiation is an impressive introduction to the subject-at least for non-specialist like me :) The code is written in C++ btw.
Look up automatic differentiation. There are tools for Python. Also, this.
If you are thinking of writing the differentiation program from scratch, without utilizing other libraries as help, then the algorithm/approach of computing the derivative of any algebraic equation I described in my blog will be helpful.
You can try creating a class that will represent a limit rigorously and then evaluate it for (f(x)-f(a))/(x-a) as x approaches a. That should give a pretty accurate value of the limit.
if you're using string as an input, you can separate individual terms using + or - char as a delimiter, which will give you individual terms. Now you can use power rule to solve for each term, say you have x^3 which using power rule will give you 3x^2, or suppose you have a more complicated term like a/(x^3) or a(x^-3), again you can single out other variables as a constant and now solving for x^-3 will give you -3a/(x^2). power rule alone should be enough, however it will require extensive use of the factorization.
Unless any already made library deriving it's quite complex because you need to parse and handle functions and expressions.
Deriving by itself it's an easy task, since it's mechanical and can be done algorithmically but you need a basic structure to store a function.

Summation notation in Haskell

On the Wikipedia page about summation it says that the equivalent operation in Haskell is to use foldl. My question is: Is there any reason why it says to use this instead of sum? Is one more 'purist' than the other, or is there no real difference?
foldl is a general tail-recursive reduce function. Recursion is the usual way of thinking about manipulating lists of items in a functional programming languages, and provides an alternative to loop iteration that is often much more elegant. In the case of a reduce function like fold, the tail-recursive implementation is very efficient. As others have explained, sum is then just a convenient mnemonic for foldl (+) 0 l.
Presumably its use on the wikipedia page is to illustrate the general principle of summation through tail-recursion. But since the Haskell Prelude library contains sum, which is shorter and more obvious to understand, you should use that in your code.
Here's a nice discussion of Haskell's fold functions with simple examples that's well worth reading.
I don't see where it says anything about Haskell or foldl on that Wikipedia page, but sum in Haskell is just a more specific case of foldl. It can be implemented like this, for example:
sum l = foldl (+) 0 l
Which can be reduced to:
sum = foldl (+) 0
One thing to note is that sum may be lazier than you would want, so consider using foldl'.
As stated by the others, there's no difference. However, a sum-call is easier to read than a fold-call, so I'd go for sum if you need summation.
There is no difference. That page is simply saying that sum is implemented using foldl. Just use sum whenever you need to calculate the sum of a list of numbers.
The concept of summation can be extended to non-numeric types: all you need is something equivalent to a (+) operation and a zero value. In other words, you need a monoid. This leads to the Haskell function "mconcat", which returns the sum of a list of values of a monoid type. The default "mconcat" of course is defined in terms of "mappend" which is the plus operation.

Resources