Execution time orders of magnitude longer depending upon global definition location? - julia

I finished writing the following program and began to do some cleanup after the debugging stage:
using BenchmarkTools
function main()
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
The runtime for my machine was around 150ms.
I decided to rearrange the globals to better match the typical layout of program, where globals are defined at the top:
using BenchmarkTools
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
function main()
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
Making that one change for where the globals were defined reduced the runtime on my machine to 0.0042ms.
Why is the runtime so drastically reduced?

Don't use globals.
Don't. Use. Globals. They are bad.
When you define your globals outside the main function, then the second time you run your function, a already equals 100, and main() bails out before doing anything at all.
Global variables are a bad idea, not just in Julia, but in programming in general. You can use them when defining proper constants, like π, and maybe some other specialized cases, but not for things like this.
Let me rewrite your function without globals:
function main_locals()
solution = 0
a = 1
while a < 100
b = 1
c = big(1)
while b < 100
c *= a
s = string(c)
total = sum(Int, s) - 48 * length(s)
solution = max(solution, total)
b += 1
end
a += 1
end
return solution
end
On my laptop this is >20x faster than your version with globals defined inside the function, that is, the version that actually works. The other one doesn't work as it should, so the comparison is not relevant.
Edit: I have even complicated this too much. The only thing you need to do is to remove all the globals from your first function, and return the solution, then it will work fine, and be almost as fast as the code I wrote:
function main_with_globals_removed()
solution = 0
a = big"1"
b = big"1"
c = big"0"
total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
solution = total
end
b = b + 1
end
b = 1
a = a + 1
end
return solution # remember return!
end
Don't use globals.

In the first case, you are always assigning the globals and possibly changing types. Hence compiler needs to do extra work. I assume that the two programs generate different answers after the 2nd run because of the failure to reset globals…
Globals are discouraged in Julia for performance reasons because of potential type instability.

Related

Multiple outputs used repeatedly in for loop Julia

I am using Julia and I've designed a for loop that takes the outputs of a function in one loop and uses them as the input of that function in the next loop (and over and over). When I run this code, Julia flags an "undefined" error, however, if I run the code in debug mode, it executes perfectly. For example, the code looks like this:
function do_command(a,b,c,d)
a = a + 1
b = split(b, keepempty=false)[1]
c = split(b, keepempty=false)[1]
if a == 1000
d = true
else
d = false
end
return a, b, c, d
end
for ii in 1:length(x)
if ii == 1
a = 0
b = "string something"
c = ""
d = false
end
a,b,c,d = do_command(a,b,c,d)
if d == true
print(string(b))
break
end
end
What am I doing wrong here?
An issue with your code is that for introduces a new scope for each iteration of the loop. That is to say: variable a created within the loop body at iteration 1 is not the same as variable a created within the loop body at iteration 2.
In order to fix your problem, you should declare variables outside the loop, so that at each iteration, references to them from within the loop body would actually refer to the same variables from the enclosing scope.
I'd go with something like this:
function do_command(a,b,c,d)
a = a + 1
b = split(b, keepempty=false)[1]
c = split(b, keepempty=false)[1]
if a == 1000
d = true
else
d = false
end
return a, b, c, d
end
# Let's create a local scope: it's good practice to avoid global variables
let
# All these variables are declared in the scope introduced by `let`
a = 0
b = "string something"
c = ""
d = false
for ii in 1:10 #length(x)
# now these names refer to the variables declared in the enclosing scope
a,b,c,d = do_command(a,b,c,d)
if d == true
print(string(b))
break
end
end
end

Abstract typing and multiple dispatch for functions in julia

I want to have objects interact with specific interactions depending on their type.
Example problem: I have four particles, two are type A, and 2 are type B. when type A's interact I want to use the function
function interaction(parm1, parm2)
return parm1 + parm2
end
when type B's interact I want to use the function
function interaction(parm1, parm2)
return parm1 * parm2
end
when type A interacts with type B I want to use function
function interaction(parm1, parm2)
return parm1 - parm2
end
These functions are purposefully over simple.
I want to calculate a simple summation that depends on pairwise interactions:
struct part
parm::Float64
end
# part I need help with:
# initialize a list of length 4, where the entries are `struct part`, and the abstract types
# are `typeA` for the first two and `typeB` for the second two. The values for the parm can be
# -1.0,3, 4, 1.5 respectively
energy = 0.0
for i in range(length(particles)-1)
for j = i+1:length(particles)
energy += interaction(particles[i].parm, particles[j].parm)
end
end
println(energy)
assuming the use of parameters being particle[1].parm = -1, particle[2].parm = 3, particle[3].parm = 4, particle[4].parm = 1.5, energy should account for the interactions of
(1,2) = -1 + 3 = 2
(1,3) = -1 - 4 = -5
(1,4) = -1 - 1.5 = -2.5
(2,3) = 3 - 4 = -1
(2,4) = 3 - 1.5 = 1.5
(3,4) = 4 * 1.5 = 6
energy = 1
Doing this with if statements is almost trivial but not extensible. I am after a clean, tidy Julia approach...
You can do this (I use the simplest form of the implementation as in this case it is enough and it is explicit what happens I hope):
struct A
parm::Float64
end
struct B
parm::Float64
end
interaction(p1::A, p2::A) = p1.parm + p2.parm
interaction(p1::B, p2::B) = p1.parm * p2.parm
interaction(p1::A, p2::B) = p1.parm - p2.parm
interaction(p1::B, p2::A) = p1.parm - p2.parm # I added this rule, but you can leave it out and get MethodError if such case happens
function total_energy(particles)
energy = 0.0
for i in 1:length(particles)-1
for j = i+1:length(particles)
energy += interaction(particles[i], particles[j])
end
end
return energy
end
particles = Union{A, B}[A(-1), A(3), B(4), B(1.5)] # Union makes sure things are compiled to be fast
total_energy(particles)
I have no idea how to do this in your language, but what you need is an analogue to what we call the strategy pattern in object-oriented programming. A strategy is a pluggable, reusable algorithm. In Java I’d make an interface like:
interface Interaction<A, B>
{
double interact(A a, B b)
}
Then implement this three times and reuse those parts wherever you need things to interact. Another method can take an Interaction and use it without knowing how it’s implemented. I think this is the effect you’re after. Sorry I don’t know how to translate into your dialect.

Julia NLopt force stops before the first iteration

I'm using NLopt for a constrained maximization problem. Regardless of the algorithm or start values, the optimization program is force stopped even before the first iteration (or so I assume because it gives me the initial value). I've attached my code here. I'm trying to find probabilities attached to a grid such that a function is maximized under some constraints. Any help is appreciated.
uk = x -> x^0.5
function objective(u,p,grd)
-p'*u.(grd)
end
function c3(grd,p)
c =[]
d =[]
for i=1:length(grd)
push!(c,quadgk(x -> (i-x)*(x <= i ? 1 : 0),0,1)[1])
push!(d,sum(p[1:i]'*(grd[1:i] .- grd[i])))
end
return append!(d-c,-p)
end
function c4(grd,p)
return (grd .* p)-quadgk(x,0,1)
end
grd = n -> collect(0:1/n:1)
opt = Opt(:LD_SLSQP,11)
inequality_constraint!(opt, p -> c3(grd(10),p))
inequality_constraint!(opt, p -> -p)
equality_constraint!(opt, p -> sum(p)-1)
equality_constraint!(opt, p -> c4(grd(10),p))
opt.min_objective = p -> objective(-uk, p, grd(10))
k = push!(ones(11)*(1/11))
(minf,minx,ret) = optimize(opt, k)
I'm not a julia developer, but I only know this, if you need exit before complete the loop for is not your best choice, you need do a while with a sentinel variable.
here you have an article that explain you how while with sentinels works
and here you have a julia example changing your for to a while with a sentinel that exit after the third loop
i = 1
third = 0
while i < length(grd) && third != 1
# of course you need change this, it is only an example that will exit in the 3 loop
if i == 3
third = 1
end
push!(c,quadgk(x -> (i-x)*(x <= i ? 1 : 0),0,1)[1])
push!(d,sum(p[1:i]'*(grd[1:i] .- grd[i])))
i += 1
end

Populating an array using a FOR loop and a function

I was expecting that the following code would populate E with random 1's and 0's, but that does not happen. I cannot figure out why.
Pkg.add("StatsBase")
using StatsBase
function randomSample(items,weights)
sample(items, Weights(weights))
end
n = 10
periods = 100
p = [ones(n,periods)*0.5]
E = fill(NaN, (n,periods))
for i in 1:periods
for ii in 1:n
E(ii,i) = randomSample([1 0],[(p(ii,i)), 1 - p(ii,i)])
end
end
E
The statement:
E(ii,i) = randomSample([1 0],[(p(ii,i)), 1 - p(ii,i)])
defines a local function E and is not an assignment operation to a matrix E. Use
E[ii,i] = randomSample([1, 0],[p[ii,i], 1 - p[ii,i]])
(I have fixed additional errors in your code so please check out the differences)
and for it to run you should also write:
p = ones(n,periods)*0.5

MATLAB: What happens for a global variable when running in the parallel mode?

What happens for a global variable when running in the parallel mode?
I have a global variable, "to_be_optimized_parameterIndexSet", which is a vector of indexes that should be optimized using gamultiobj and I have set its value only in the main script(nowhere else).
My code works properly in serial mode but when I switch to parallel mode (using "matlabpool open" and setting proper values for 'gaoptimset' ) the mentioned global variable becomes empty (=[]) in the fitness function and causes this error:
??? Error using ==> parallel_function at 598
Error in ==> PF_gaMultiFitness at 15 [THIS LINE: constants(to_be_optimized_parameterIndexSet) = individual;]
In an assignment A(I) = B, the number of elements in B and
I must be the same.
Error in ==> fcnvectorizer at 17
parfor (i = 1:popSize)
Error in ==> gamultiobjMakeState at 52
Score =
fcnvectorizer(state.Population(initScoreProvided+1:end,:),FitnessFcn,numObj,options.SerialUserFcn);
Error in ==> gamultiobjsolve at 11
state = gamultiobjMakeState(GenomeLength,FitnessFcn,output.problemtype,options);
E rror in ==> gamultiobj at 238
[x,fval,exitFlag,output,population,scores] = gamultiobjsolve(FitnessFcn,nvars, ...
Error in ==> PF_GA_mainScript at 136
[x, fval, exitflag, output] = gamultiobj(#(individual)PF_gaMultiFitness(individual, initialConstants), ...
Caused by:
Failure in user-supplied fitness function evaluation. GA cannot continue.
I have checked all the code to make sure I've not changed this global variable everywhere else.
I have a quad-core processor.
Where is the bug? any suggestion?
EDIT 1: The MATLAB code in the main script:
clc
clear
close all
format short g
global simulation_duration % PF_gaMultiFitness will use this variable
global to_be_optimized_parameterIndexSet % PF_gaMultiFitness will use this variable
global IC stimulusMoment % PF_gaMultiFitness will use these variables
[initialConstants IC] = oldCICR_Constants; %initialize state
to_be_optimized_parameterIndexSet = [21 22 23 24 25 26 27 28 17 20];
LB = [ 0.97667 0.38185 0.63529 0.046564 0.23207 0.87484 0.46014 0.0030636 0.46494 0.82407 ];
UB = [1.8486 0.68292 0.87129 0.87814 0.66982 1.3819 0.64562 0.15456 1.3717 1.8168];
PopulationSize = input('Population size? ') ;
GaTimeLimit = input('GA time limit? (second) ');
matlabpool open
nGenerations = inf;
options = gaoptimset('PopulationSize', PopulationSize, 'TimeLimit',GaTimeLimit, 'Generations', nGenerations, ...
'Vectorized','off', 'UseParallel','always');
[x, fval, exitflag, output] = gamultiobj(#(individual)PF_gaMultiFitness(individual, initialConstants), ...
length(to_be_optimized_parameterIndexSet),[],[],[],[],LB,UB,options);
matlabpool close
some other piece of code to show the results...
The MATLAB code of the fitness function, "PF_gaMultiFitness":
function objectives =PF_gaMultiFitness(individual, constants)
global simulation_duration IC stimulusMoment to_be_optimized_parameterIndexSet
%THIS FUNCTION RETURNS MULTI OBJECTIVES AND PUTS EACH OBJECTIVE IN A COLUMN
constants(to_be_optimized_parameterIndexSet) = individual;
[smcState , ~, Time]= oldCICR_CompCore(constants, IC, simulation_duration,2);
targetValue = 1; % [uM]desired [Ca]i peak concentration
afterStimulus = smcState(Time>stimulusMoment,14); % values of [Ca]i after stimulus
peak_Ca_value = max(afterStimulus); % smcState(:,14) is [Ca]i
if peak_Ca_value < 0.8 * targetValue
objectives(1,1) = inf;
else
objectives(1, 1) = abs(peak_Ca_value - targetValue);
end
pkIDX = peakFinder(afterStimulus);
nPeaks = sum(pkIDX);
if nPeaks > 1
peakIndexes = find(pkIDX);
period = Time(peakIndexes(2)) - Time(peakIndexes(1));
objectives(1,2) = 1e5* 1/period;
elseif nPeaks == 1 && peak_Ca_value > 0.8 * targetValue
objectives(1,2) = 0;
else
objectives(1,2) = inf;
end
end
Global variables do not get passed from the MATLAB client to the workers executing the body of the PARFOR loop. The only data that does get sent into the loop body are variables that occur in the text of the program. This blog entry might help.
it really depends on the type of variable you're putting in. i need to see more of your code to point out the flaw, but in general it is good practice to avoid assuming complicated variables will be passed to each worker. In other words anything more then a primitive may need to be reinitialized inside a parallel routine or may need have specific function calls (like using feval for function handles).
My advice: RTM

Resources