I am working with a program which includes many function calls inside a for loop. For short, it is something like this:
function something()
....
....
timer = zeros(NSTEP);
for it = 1:NSTEP # time steps
tic = time_ns();
Threads.#threads for p in 1:2 # Star parallel of two sigma functions
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[p] = func_sig[p](arg_in_sig[p]);
end
.....
.....
Threads.#threads for p in 1:2
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[p] = func_vel[p](arg_in_vel[p])
end
toc=time_ns();
timer[i] = toc-tic;
end # time loop
writedlm("timer.txt",timer)
return
end
What I am trying to do, is to meassure the time that takes to perform on each loop iteration, saving the result in an output file called "timer.txt". The thing is that it doesn't work.
It saves a file with all zeros on it (except two or three values, which is more confusing).
I made a toy example like:
using DelimitedFiles;
function test()
a=zeros(1000)
for i=1:1000
tic = time_ns();
C = rand(20,20)*rand(20,20);
toc = time_ns();
a[i] = toc-tic;
end
writedlm("aaa.txt",a);
return a;
end
and these actually works (it saves fine!). Is there something to do with the fact that I am implementing Threads.#threads?. What can be happening between writedlm() and time_ns() in my program?
Any help would be much apreciated!
You are iterating over it but try to save by:
timer[i] = toc-tic;
while it should be
timer[it] = toc-tic;
Perhaps you have some i in global scope and hence the code still works.
Additionally locking the thread and immediately unlocking does not seem to make much sense. Moreover, when you iterate over p which happens to be also index of the Vector cell where you save the results there is no need to use the locking mechanism at all (unless you are calling some functions that depend on a global state).
Related
I'm trying to train a UNet in Julia with the help of Flux.
Flux.train!(loss, Flux.params(model), train_data_loader, opt)
batch_loss = loss(train_data, train_targets)
where the loss is
logitcrossentropy
and train_data_loader is
train_data_loader = DataLoader((train_data |> device, train_targets |> device), batchsize=batch_size, shuffle=true)
I dont understand how to take the loss from Flux.train out for printing loss (is that validation loss?). Evalcb will also trigger a call to calculate loss, so its not different. I was to skip extra calculation.
So What I did is call the loss function again and store it in a variable then print it per batch. Is there a way to print loss from Flux.train() instead of calling loss again?
Instead of altering train! like #Tomas suggested, the loss function can be instrumented to log the return value. Printing stuff during calculation sounds like a bad idea for decent performance, so I've made an example where the loss is logged into a global vector:
using ChainRulesCore
# returns another loss function which is the same as the function
# in parameter, but push!es the return value into global variable
# `loss_log_vec`
function logged_loss(lossfn, history)
return function _loss(args...)
err = lossfn(args...)
ignore_derivatives() do
push!(history, err)
end
return err
end
end
# initialize log vector
log_vec = Float32[]
# use function above to create logging loss function
newloss = logged_loss(loss, log_vec)
# run the training
Flux.train!(newloss, Flux.params(W, b), train_data, opt)
At this point, log_vec should include a record of return values from loss function. This is a rough solution, which uses annoying global variables. Interpreting the loss return values depends also on the nature of the optimizer. For my test, there was one call per epoch and it returned a decreasing loss until convergence. [This answer incorporates suggestions from #darsnack]
Note, since the log_vec is incorporated into the loss function, to clear the log, it must not be reassigned but clear!ed with clear!(log_vec).
Adding to #Dan's answer, you can also augment your loss function with logging on the fly using the do syntax:
using ChainRules
loss_history = Float32[]
Flux.train!(Flux.params(model), train_data_loader, opt) do x, y
err = loss(x, y)
ChainRules.ignore_derivatives() do
push!(loss_history, err)
end
return err
end
You would need to write your own version of Flux.train! using withgradient instead of gradient function. withgradient gives you the output of the loss (or a function which you are differentiating to be more precise). Flux.train! (https://github.com/FluxML/Flux.jl/blob/8bc0c35932c4a871ac73b42e39146cd9bbb1d446/src/optimise/train.jl#L123) is literaly few lines of code, therefore updating it to your version is very easy.
While memoization of a function is a good idea, it could cause a program to crash because the program could potentially run out of memory.
Therefore it is NOT A SAFE OPTION to be used in a production program.
Instead I have developed caching with a fixed memory slots below with a soft limit and hard limit. When the cache slots is above the hard limit, it will have the least used slots deleted until the number of slots is reduced to the soft limit.
struct cacheType
softlimit::Int
hardlimit::Int
memory::Dict{Any,Any}
freq::Dict{Any,Int}
cacheType(soft::Int,hard::Int) = new(soft,hard,Dict(),Dict())
end
function tidycache!(c::cacheType)
memory_slots=length(c.memory)
if memory_slots > c.hardlimit
num_to_delete = memory_slots - c.softlimit
# Now sort the freq dictionary into array of key => AccessFrequency
# where the first few items have the lowest AccessFrequency
for item in sort(collect(c.freq),by = x -> x[2])[1:num_to_delete]
delete!(c.freq, item[1])
delete!(c.memory, item[1])
end
end
end
# Fibonacci function
function cachefib!(cache::cacheType,x)
if haskey(cache.memory,x)
# Increment the number of times this key has been accessed
cache.freq[x] += 1
return cache.memory[x]
else
# perform housekeeping and remove cache entries if over the hardlimit
tidycache!(cache)
if x < 3
cache.freq[x] = 1
return cache.memory[x] = 1
else
result = cachefib!(cache,x-2) + cachefib!(cache,x-1)
cache.freq[x] = 1
cache.memory[x] = result
return result
end
end
end
c = cacheType(3,4)
cachefib!(c,3)
cachefib!(c,4)
cachefib!(c,5)
cachefib!(c,6)
cachefib!(c,4)
println("c.memory is ",c.memory)
println("c.freq is ",c.freq)
I think this would be most useful in a production environment than just using memorization with no limits of memory consumption which could result in a program crashing.
In Python language, they have
#functools.lru_cache(maxsize=128, typed=False)
Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.
Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable.
Is there an equivalent in Julia language?
There is LRUCache.jl, which provides an LRU type which basically acts like a Dict. Unfortunately, this doesn't seem to work with the Memoize.jl package, but you can use my answer to your other question:
using LRUCache
const fibmem = LRU{Int,Int}(3) # store only 3 values
function fib(n)
get!(fibmem, n) do
n < 3 ? 1 : fib(n-1) + fib(n-2)
end
end
When I run the following code, I get a deprecation saying produce has been replace with channels.
function source(dir)
filelist = readdir(dir)
for filename in filelist
name,ext = splitext(filename)
if ext == ".jld"
produce(filename)
end
end
end
path = "somepathdirectoryhere"
for fname in Task(source(path))
println(fname)
end
I cannot find an example on how to do this with channels. I've tried creating a global channel and using put! instead of produce with no luck.
Any ideas?
Here's one way. Modify your function to accept a channel argument, and put! data in it:
function source(dir, chnl)
filelist = readdir(dir)
for filename in filelist
name, ext = splitext(filename)
if ext == ".jld"
put!(chnl, filename) % this blocks until "take!" is used elsewhere
end
end
end
Then create your task implicitly using the Channel constructor (which takes a function with a single argument only representing the channel, so we need to wrap the source function around an anonymous function):
my_channel = Channel( (channel_arg) -> source( pwd(), channel_arg) )
Then, either check the channel is still open (i.e. task hasn't finished) and if so take an argument:
julia> while isopen( my_channel)
take!( my_channel) |> println;
end
no.jld
yes.jld
or, use the channel itself as an iterator (iterating over Tasks is becoming deprecated, along with the produce / consume functionality)
julia> for i in my_channel
i |> println
end
no.jld
yes.jld
Alternatively you can use #schedule with bind etc as per the documentation, but it seems like the above is the most straightforward way.
I came up with 2 ways to add all the arrays in a pair of composite types. The first way (add_structs_1) takes 4 seconds to run and the second way (add_structs_2) takes 0.15 seconds. But the second way requires a lot more code...I have to explicitly mention each field in the composite type. Is there a way to get the efficiency of add_structs_2, without explicitly listing each field?
type SampleStruct
a::Vector{Float64}
k::Matrix{Float64}
e_axis::Vector{Float64}
e_dev::Vector{Float64}
e_scale::Vector{Float64}
end
function add_structs_1(tgt::SampleStruct, src::SampleStruct)
for n in names(SampleStruct)
for i in 1:length(tgt.(n))
tgt.(n)[i] += src.(n)[i]
end
end
end
function add_structs_2(tgt::SampleStruct, src::SampleStruct)
for i in 1:length(tgt.a)
tgt.a[i] += src.a[i]
end
for i in 1:length(tgt.k)
tgt.k[i] += src.k[i]
end
for i in 1:length(tgt.e_axis)
tgt.e_axis[i] += src.e_axis[i]
end
for i in 1:length(tgt.e_dev)
tgt.e_dev[i] += src.e_dev[i]
end
for i in 1:length(tgt.e_scale)
tgt.e_scale[i] += src.e_scale[i]
end
end
function time_add_structs(f::Function)
src = SampleStruct(ones(3), ones(3,3), [1.], [1.], [1.])
tgt = SampleStruct(ones(3), ones(3,3), [1.], [1.], [1.])
#time for i in 1:1000000
f(tgt, src)
end
end
time_add_structs(add_structs_1)
time_add_structs(add_structs_1)
time_add_structs(add_structs_2)
time_add_structs(add_structs_2)
time_add_structs(add_structs_3)
time_add_structs(add_structs_3)
A more julian approach to add_structs_1 is to make the inner loop a separate function, this allows the compiler to specialize the function on each type in the SampleStruct and gives quite a speedup.
By profiling the code it was visible that the time to execute names(SampleStruct) were quite significant, and this should be done in each iteration of your benchmark, by making it a global constant some time is gained and the function now looks like:
function add_array(a::AbstractArray,b::AbstractArray)
for i in 1:length(a)
a[i] += b[i]
end
end
const names_in_struct = names(SampleStruct)
function add_structs_3(tgt::SampleStruct, src::SampleStruct)
for n in names_in_struct
add_array(tgt.(n),src.(n))
end
end
The function is now within a factor of four of add_structs_2
The metaprogramming approach is more complicated but gives the same performance as add_structs_2
ex = Any[]
for n in names(SampleStruct)
t = Expr(:.,:tgt, QuoteNode(n))
s = Expr(:.,:src, QuoteNode(n))
e=quote
for i in 1:length($t)
$t[i] += $s[i]
end
end
push!(ex,e)
end
eval(quote function add_structs_4(tgt::SampleStruct, src::SampleStruct)
$(Expr(:block,ex...))
end
end)
Each of those for loops could be replaced with a one-liner, making the long version just this:
function add_structs_3(tgt::SampleStruct, src::SampleStruct)
tgt.a[:] += src.a
tgt.k[:,:] += src.k
tgt.e_axis[:] += src.e_axis
tgt.e_dev[:] += src.e_dev
tgt.e_scale[:] += src.e_scale
end
This is the same length as add_structs_1 but slower because it actually builds a temporary array and then does the assignment. You could also use some metaprogramming to generate the longer code.
An approach that should get all the performance of the best case is to combine Daniel's and Stefan's answers: define addition as a separate function just like in Daniel's solution, instead of iterating over the names list each field manually like in Stefan's answer.
Below is the code for my program. I'm attempting to find the value of the integral of 1/ln(x), and then evaluate the integral from 0 to x, with this as the integrand. I'm not exactly sure what I'm doing wrong, but I am quite new to Scilab.
t = input("t");
x=10; while x<t, x=x+10,
function y=f(x), y=(1/(log (x))), endfunction
I=intg(2,x,f);
function z=g(x), z=I, endfunction
W = intg(0,x,z);
W
end
I'm not entirely sure on what you are trying to achieve, but I reformatted your code and added some suggestions to documentation.
Maybe it will help you in finding the answer.
While loop
You can convert your while loop to a for loop
Your code
x=10;
while x<t
x=x+10
//some code
end
Could be
for x=10:10:t
//some code
end
Functions
In your code, you redeclare the two functions every single iteration of the while loop. You could declare them outside the while loop and call them inside the loop.
Reformatted
t = input("Please provide t: ");
// The function of 1/ln(x)
function y=f(x), y=1/log(x), endfunction
// Every time g(x) is called the current value of I is returned
function z=g(x), z=I, endfunction
for x=10:10:t
//Find definite integral of function f from 2 to x
I = intg(2,x,f);
//Find definite integral of I from 0 to x
W = intg(0,x,g);
disp( string(W) );
end
I know the question is porbably outdated; but the topic is still active. And I was looking for a code with double integral.
Here, it looks strange to use "intg" just to calculate the area of the rectangle defined by its diagonal ((0,0), (x,I)): the result is just x*I...
May be the initial aim was to consider "I" as a function of "x" (but in this case there is a convergence problem at x=1...); so restricting the integration of "I" to something above 1 gives the following code:
x=10:10:100;W2=integrate('integrate(''1/log(x2)'',''x2'',2,x1)','x1',1.001,x);
Note the use of integration variables x1 and x2, plus the use of quotes...