Using `\` instead of `/` in Julia - julia

For scalars, the \ (solve linear system) operator is equivalent to the division operator /. Is the performance similar?
I ask because currently my code has a line like
x = (1 / alpha) * averylongfunctionname(input1, input2, input3)
Visually, it is important that the division by alpha happens on the "left," so I am considering replacing this with
x = alpha \ averylongfunctionname(input1, input2, input3)
What is the best practice in this situation, from the standpoint of style and the standpoint of performance?
Here are some perplexing benchmarking results:
julia> using BenchmarkTools
[ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]
julia> #btime x[1]\sum(x) setup=(x=rand(100))
15.014 ns (0 allocations: 0 bytes)
56.23358979466163
julia> #btime (1/x[1]) * sum(x) setup=(x=rand(100))
13.312 ns (0 allocations: 0 bytes)
257.4552413802698
julia> #btime sum(x)/x[1] setup=(x=rand(100))
14.929 ns (0 allocations: 0 bytes)
46.25209548841374
They are all about the same, but I'm surprised that the (1 / x) * foo approach has the best performance.

Scalar / and \ really should have the same meaning and performance. Let's define these two test functions:
f(a, b) = a / b
g(a, b) = b \ a
We can then see that they produce identical LLVM code:
julia> #code_llvm f(1.5, 2.5)
; # REPL[29]:1 within `f'
define double #julia_f_380(double %0, double %1) {
top:
; ┌ # float.jl:335 within `/'
%2 = fdiv double %0, %1
; └
ret double %2
}
julia> #code_llvm g(1.5, 2.5)
; # REPL[30]:1 within `g'
define double #julia_g_382(double %0, double %1) {
top:
; ┌ # operators.jl:579 within `\'
; │┌ # float.jl:335 within `/'
%2 = fdiv double %0, %1
; └└
ret double %2
}
And the same machine code too. I'm not sure what is causing the differences in #btime results, but I'm pretty sure that the difference between / and \ is an illusion and not real.
As to x*(1/y), that does not compute the same thing as x/y: it will be potentially less accurate since there is rounding done when computing 1/y and then that rounded value is multiplied by x, which also rounds. For example:
julia> 17/0.7
24.28571428571429
julia> 17*(1/0.7)
24.285714285714285
Since floating point division is guaranteed to be correctly rounded, doing division directly is always going to be more accurate. If the divisor is shared by a lot of loop iterations, however, you can get a speedup by rewriting the computation like this since floating-point multiplication is usually faster than division (although timing my current computer does not show this). Be aware that this comes at a loss of accuracy, however, and if the divisor is not shared there would still be a loss of accuracy and no performance gain.

I don't know, but I can suggest you to try using BenchmarkTools
package: it can help you to evaluate the performance of the two
different statements. Here you can find more details. Bye!
I think that the best choice is (1/x)*foo for two reasons:
it has the best performance (although not much compared to the other ones);
it is more clear for another person reading the code.

Related

Extremely Sparse Integer Quadratic Programming

I am working on an optimization problem with a huge number of variables (upwards of hundreds of millions). Each of them should be a 0-1 binary variable.
I can write it in the form (maximize x'Qx) where Q is positive semi-definite, and I am using Julia, so the package COSMO.jl seems like a great fit. However, there is a ton of sparsity in my problem. Q is 0 except on approximately sqrt(|Q|) entries, and for the constraints there are approximately sqrt(|Q|) linear constraints on the variables.
I can describe this system pretty easily using SparseArrays, but it appears the most natural way to input problems into COSMO uses standard arrays. Is there a way I can take advantage of the sparsity in this massive problem?
While there is no sample code in your perhaps this could help:
JuMP works with sparse arrays so perhaps the easiest thing could be just use it in the construction of the goal function:
julia> using JuMP, SparseArrays, COSMO
julia> m = Model(with_optimizer(COSMO.Optimizer));
julia> q = sprand(Bool, 20, 20,0.05) # for readability I use a binary q
20×20 SparseMatrixCSC{Bool, Int64} with 21 stored entries:
⠀⠀⠀⡔⠀⠀⠀⠀⡀⠀
⠀⠀⠂⠀⠠⠀⠀⠈⠑⠀
⠀⠀⠀⠀⠀⠤⠀⠀⠀⠀
⠀⢠⢀⠄⠆⠀⠂⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠄⠀⠀⠌
julia> #variable(m, x[1:20], Bin);
julia> x'*q*x
x[1]*x[14] + x[14]*x[3] + x[15]*x[8] + x[16]*x[5] + x[18]*x[4] + x[18]*x[13] + x[19]*x[14] + x[20]*x[11]
You can see that the equation gets correctly reduced.
Indeed you could check the performance with a very sparse q having 100M elements:
julia> q = sprand(10000, 10000,0.000001)
10000×10000 SparseMatrixCSC{Float64, Int64} with 98 stored entries:
...
julia> #variable(m,z[1:10000], Bin);
julia> #btime $z'*$q*$z
1.276 ms (51105 allocations: 3.95 MiB)
You can see that you are just getting the expected performance when constructing the goal function.

Calculate pi in prolog recursively with Leibniz formula

I want to learn some prolog and found the exercise to calculate pi recursively for a given predicat pi(10, Result). I don't want it to be tail recursive because I find tail recursion to be easier. I've been trying to do this for hours now but it seems like I can't come to a solution, this is how far I've come:
(I'm using Leibniz' pi formula as reference)
pi(0, 0).
pi(Next, Result) :-
Num is -1**(Next + 1),
Part is Num / (2 * Next - 1),
N1 is Next -1,
pi(N1, R),
Result is Part + R.
Now, I'm aware that the addition at the end is wrong. Also I need to multiply the end result by 4 and I don't know how to do that. Would be glad if anyone could help out. And no, this is not a homework or anything. :)
Here's a slightly different twist that terminates based upon reaching a given precision. It also is tail recursive. Because Leibniz converges very slowly, the formula is a stack hog when done using simple recursion. it's not an algorithm well-suited for a recursive solution in any language. However, a smart Prolog interpreter can take advantage of the tail recursion and avoid that. Just by way of example, it only allows precision within a specific range.
pi(Precision, Pi) :-
Precision > 0.0000001,
Precision < 0.1,
pi_over_4(1, 1, Precision/4, 1, Pi_over_4), % Compensate for *4 later
Pi is Pi_over_4 * 4.
pi_over_4(AbsDenominator, Numerator, Precision, Sum, Result) :-
NewAbsDenominator is AbsDenominator + 2,
NewNumerator is -Numerator,
NewSum is Sum + NewNumerator/NewAbsDenominator,
( abs(NewSum - Sum) < Precision
-> Result = NewSum
; pi_over_4(NewAbsDenominator, NewNumerator, Precision, NewSum, Result)
).
2 ?- pi(0.0001, P).
P = 3.1416426510898874.
3 ?- pi(0.00001, P).
P = 3.141597653564762.
4 ?- pi(0.000005, P).
P = 3.141595153583494.
This is strictly an imperative use of Prolog, which isn't what Prolog is strong for.

Julia on Float versus Octave on Float

Version: v"0.5.0-dev+1259"
Context: The goal is to calculate the Rademacher penalty bound on a give data points n with respect to VC-dimension dvc and probability expressed by delta
Please consider Julia code:
#Growth function on any n points with respect to VC-dimmension
function mh(n, dvc)
if n <= dvc
2^n #A
else
n^dvc #B
end
end
#Rademacher penalty bound
function rademacher_penalty_bound(n::Int, dvc::Int, delta::Float64)
sqrt((2.0*log(2.0*n*mh(n,dvc)))/n) + sqrt((2.0/n)*log(1.0/delta)) + 1.0/n
end
and the equivalent code in Octave/Matlab:
%Growth function on n points for a give VC dimmension (dvc)
function md = mh(n, dvc)
if n <= dvc
md= 2^n;
else
md = n^dvc;
end
end
%Rademacher penalty bound
function epsilon = rademacher_penalty_bound (n, dvc, delta)
epsilon = sqrt ((2*log(2*n*mh(n,dvc)))/n) + sqrt((2/n)*log(1/delta)) + 1/n;
end
Problem:
When I start testing it I receive the following results:
Julia first:
julia> rademacher_penalty_bound(50, 50, 0.05) #50 points
1.619360057204432
julia> rademacher_penalty_bound(500, 50, 0.05) #500 points
ERROR: DomainError:
[inlined code] from math.jl:137
in rademacher_penalty_bound at none:2
in eval at ./boot.jl:264
Now Octave:
octave:17> rademacher_penalty_bound(50, 50, 0.05)
ans = 1.6194
octave:18> rademacher_penalty_bound(500, 50, 0.05)
ans = 1.2387
Question: According to Noteworthy differences from MATLAB I think I followed the rule of thumb ("literal numbers without a decimal point (such as 42) create integers instead of floating point numbers..."). The code crashes when the number of points exceeds 51 (line #B in mh). Can someone with more experience can look at the code and say what I should improve/change?
While BigInt and BigFloat will work here, they're serious overkill. The real issue is that you're doing integer exponentiation in Julia and floating-point exponentiation in Octave/Matlab. So you just need to change mh to use floats instead of integers for exponents:
mh(n, dvc) = n <= dvc ? 2^float(n) : n^float(dvc)
rademacher_penalty_bound(n, dvc, δ) =
√((2log(2n*mh(n,dvc)))/n) + √(2log(1/δ)/n) + 1/n
With these definitions, you get the same results as Octave/Matlab:
julia> rademacher_penalty_bound(50, 50, 0.05)
1.619360057204432
julia> rademacher_penalty_bound(500, 50, 0.05)
1.2386545010981596
In Octave/Matlab, even when you input a literal without a decimal point, you still get a float – you have to do an explicit cast to int type. Also, exponentiation in Octave/Matlab always converts to float first. In Julia, x^2 is equivalent to x*x which prohibits conversion to floating-point.
Although BigInt and BigFloat are excellent tools when they are necessary, they should usually be avoided, since they are overkill and slow.
In this case, the problem is indeed the difference between Octave, that treats everything as a floating-point number, and Julia, that treats e.g. 2 as an integer.
So the first thing to do is to use floating-point numbers in Julia too:
function mh(n, dvc)
if n <= dvc
2.0 ^ n
else
Float64(n) ^ dvc
end
end
This already helps, e.g. mh(50, 50) works.
However, the correct solution for this problem is to look at the code more carefully, and realise that the function mh only occurs inside a log:
log(2.0*n*mh(n,dvc))
We can use the laws of logarithms to rewrite this as
log(2.0*n) + log_mh(n, dvc)
where log_mh is a new function, which returns the logarithm of the result of mh. Of course, this should not be written directly as log(mh(n, dvc)), but is rather a new function:
function log_mh(n, dvc)
if n <= dvc
n * log(2.0)
else
dvc * log(n)
end
end
In this way, you will be able to use huge numbers without overflow.
I don't know is it acceptable to get results of BigFloat but anyway in julia part you can use BigInt
#Growth function on any n points with respect to VC-dimmension
function mh(n, dvc)
if n <= dvc
(BigInt(2))^n #A
else
n^dvc #B
end
end
#Rademacher penalty bound
function rademacher_penalty_bound(n::BigInt, dvc::BigInt, delta::Float64)
sqrt((2.0*log(2.0*n*mh(n,dvc)))/n) + sqrt((2.0/n)*log(1.0/delta)) + 1.0/n
end
rademacher_penalty_bound(BigInt(500), BigInt(500), 0.05)
# => 1.30055251010957621105182244420.....
Because by default a Julia Int is a "machine-size" integer, a 64-bit integer for the common x86-64 platform, whereas Octave uses floating point. So in Julia mh(500,50) overflows. You can fix it by replacing mh() as follows:
function mh(n, dvc)
n2 = BigInt(n) # Or n2 = Float64(n)
if n <= dvc
2^n2 #A
else
n2^dvc #B
end
end

dividing by 2 and ceiling until remains 1

having the following algorithm only for natural numbers:
rounds(n)={1, if n=1; 1+rounds(ceil(n/2)), else}
so writing in a programming language this will be
int rounds(int n){
if(n==1)
return 1;
return 1+rounds(ceil(n/2));
}
i think this has time complexity O(log n)
is there a better complexity?
Start by listing the results from 1 upward,
rounds(1) = 1
rounds(2) = 1 + rounds(2/2) = 1 + 1 = 2
Next, when ceil(n/2) is 2, rounds(n) will be 3. That's for n = 3 and n = 4.
rounds(3) = rounds(4) = 3
then, when ceil(n/2) is 3 or 4, the result will be 4. 3 <= ceil(n/2) <= 4 happens if and only if 2*3-1 <= n <= 2*4, so
round(5) = ... = rounds(8) = 4
Continuing, you can see that
rounds(n) = k+2 if 2^k < n <= 2^(k+1)
by induction.
You can rewrite that to
rounds(n) = 2 + floor(log_2(n-1)) if n > 1 [and rounds(1) = 1]
and mathematically, you can also treat n = 1 uniformly by rewriting it to
rounds(n) = 1 + floor(log_2(2*n-1))
The last formula has the potential for overflow if you're using fixed-width types, though.
So the question is
how fast can you compare a number to 1,
how fast can you subtract 1 from a number,
how fast can you compute the (floor of the) base-2 logarithm of a positive integer?
For a fixed-width type, thus a bounded range, all these are of course O(1) operations, but then you're probably still interested in making it as efficient as possible, even though computational complexity doesn't enter the game.
For native machine types - which int and long usually are - comparing and subtracting integers are very fast machine instructions, so the only possibly problematic one is the base-2 logarithm.
Many processors have a machine instruction to count the leading 0-bits in a value of the machine types, and if that is made accessible by the compiler, you will get a very fast implementation of the base-2 logarithm. If not, you can get a faster version than the recursion using one of the classic bit-hacks.
For example, sufficiently recent versions of gcc and clang have a __builtin_clz (resp. __builtin_clzl for 64-bit types) that maps to the bsr* instruction if that is present on the processor, and presumably a good implementation using some bit-twiddling if it isn't provided by the processor.
The version
unsigned rounds(unsigned long n) {
if (n <= 1) return n;
return sizeof n * CHAR_BIT + 1 - __builtin_clzl(n-1);
}
using the bsrq instruction takes (on my box) 0.165 seconds to compute rounds for 1 to 100,000,000, the bit-hack
unsigned rounds(unsigned n) {
if (n <= 1) return n;
--n;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n -= (n >> 1) & 0x55555555;
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
return ((n * 0x01010101) >> 24)+1;
}
takes 0.626 seconds, and the naive loop
unsigned rounds(unsigned n) {
unsigned r = 1;
while(n > 1) {
++r;
n = (n+1)/2;
}
return r;
}
takes 1.865 seconds.
If you don't use a fixed-width type, but arbitrary precision integers, things change a bit. The naive loop (or recursion) still uses Θ(log n) steps, but the steps take Θ(log n) time (or worse) on average, so overall you have a Θ(log² n) algorithm (or worse). Then using the formula above can not only offer an implementation with lower constant factors, but one with lower algorithmic complexity.
Comparing to 1 can be done in constant time for suitable representations, O(log n) is the worst case for reasonable representations.
Subtracting 1 from a positive integer takes O(log n) for reasonable representations.
Computing the (floor of the) base-2 logarithm can be done in constant time for some representations, and in O(log n) for other reasonable representations [if they use a power-of-2 base, which all arbitrary precision libraries I'm semi-familiar with do; if they used a power-of-10 base, that would be different].
If you think of the algorithm as iterative and the numbers as binary, then this function shifts out the lowest bit and increases the number by 1 if it was a 1 that was shifted out. Thus, except for the increment, it counts the number of bits in the number (that is, the position of the highest 1). The increment will eventually increase the result by one, except when the number is of the form 1000.... Thus, you get the number of bits plus one, or the number of bits if the number is a power of two. Depending on your machine model, this might be faster to calculate than O(log n).

Prolog factorial recursion

I'm having trouble understanding the following factorial program
fact1(0,Result) :-
Result is 1.
fact1(N,Result) :-
N > 0,
N1 is N-1,
fact1(N1,Result1),
Result is Result1*N.
When fact1 is called nested within the second fact1, doesn't that mean that the the last line, Result is Result1*N., is never called? Or in Prolog does the last line get executed before the recursive call?
BTW once you got the basic recursion understood, try to achieve tail recursion whenever possible, here it'd be:
factorial(N, R) :- factorial(N, 1, R).
factorial(0, R, R) :- !.
factorial(N, Acc, R) :-
NewN is N - 1,
NewAcc is Acc * N,
factorial(NewN, NewAcc, R).
Tail recursion, unlike the recursion you used previously, allows interpreter/compiler to flush context when going on to the next step of recursion. So let's say you calculate factorial(1000), your version will maintain 1000 contexts while mine will only maintain 1. That means that your version will eventually not calculate the desired result but just crash on an Out of call stack memory error.
You can read more about it on wikipedia.
No, the recursive call happens first! It has to, or else that last clause is meaningless. The algorithm breaks down to:
factorial(0) => 1
factorial(n) => factorial(n-1) * n;
As you can see, you need to calculate the result of the recursion before multiplying in order to return a correct value!
Your prolog implementation probably has a way to enable tracing, which would let you see the whole algorithm running. That might help you out.
Generally speaking, #m09's answer is basically right about the importance of tail-recursion.
For big N, calculating the product differently wins! Think "binary tree", not "linear list"...
Let's try both ways and compare the runtimes. First, #m09's factorial/2:
?- time((factorial(100000,_),false)).
% 200,004 inferences, 1.606 CPU in 1.606 seconds (100% CPU, 124513 Lips)
false.
Next, we do it tree-style—using meta-predicate reduce/3 together with lambda expressions:
?- time((numlist(1,100000,Xs),reduce(\X^Y^XY^(XY is X*Y),Xs,_),false)).
% 1,300,042 inferences, 0.264 CPU in 0.264 seconds (100% CPU, 4922402 Lips)
false.
Last, let's define and use dedicated auxiliary predicate x_y_product/3:
x_y_product(X, Y, XY) :- XY is X*Y.
What's to gain? Let's ask the stopwatch!
?- time((numlist(1,100000,Xs),reduce(x_y_product,Xs,_),false)).
% 500,050 inferences, 0.094 CPU in 0.094 seconds (100% CPU, 5325635 Lips)
false.
factorial(1, 1).
factorial(N, Result) :- M is N - 1,
factorial(M, NextResult), Result is NextResult * N.
Base case is declared. The conditions that N must be positive and multiply with previous term.
factorial(0, 1).
factorial(N, F) :-
N > 0,
Prev is N -1,
factorial(Prev, R),
F is R * N.
To run:
factorial(-1,X).
A simple way :
factorial(N, F):- N<2, F=1.
factorial(N, F) :-
M is N-1,
factorial(M,T),
F is N*T.
I would do something like:
fact(0, 1).
fact(N, Result):-
Next is N - 1,
fact(Next, Recursion),
Result is N * Recursion.
And a tail version would be like:
tail_fact(0, 1, 0). /* when trying to calc factorial of zero */
tail_fact(0, Acc, Res):- /* Base case of recursion, when reaches zero return Acc */
Res is Acc.
tail_fact(N, Acc, Res):- /* calculated value so far always goes to Acc */
NewAcc is N * Acc,
NewN is N - 1,
tail_fact(NewN, NewAcc, Res).
So for you to call the:
non-tail recursive method: fact(3, Result).
tail recursive method: tail_fact(3, 1, Result).
This might help ;)
non-tailer recursion :
fact(0,1):-!.
fact(X,Y):- Z=X-1,
fact(Z,NZ),Y=NZ*X.
tailer recursion:
fact(X,F):- X>=0,fact_aux(X,F,1).
fact_aux(0,F,F):-!.
fact_aux(X,F,Acc):-
NAcc=Acc*X, NX=X-1,
fact_aux(NX,F,NAcc).

Resources