I'm trying to train a UNet in Julia with the help of Flux.
Flux.train!(loss, Flux.params(model), train_data_loader, opt)
batch_loss = loss(train_data, train_targets)
where the loss is
and train_data_loader is
train_data_loader = DataLoader((train_data |> device, train_targets |> device), batchsize=batch_size, shuffle=true)
I dont understand how to take the loss from Flux.train out for printing loss (is that validation loss?). Evalcb will also trigger a call to calculate loss, so its not different. I was to skip extra calculation.
So What I did is call the loss function again and store it in a variable then print it per batch. Is there a way to print loss from Flux.train() instead of calling loss again?
Instead of altering train! like #Tomas suggested, the loss function can be instrumented to log the return value. Printing stuff during calculation sounds like a bad idea for decent performance, so I've made an example where the loss is logged into a global vector:
using ChainRulesCore
# returns another loss function which is the same as the function
# in parameter, but push!es the return value into global variable
# `loss_log_vec`
function logged_loss(lossfn, history)
return function _loss(args...)
err = lossfn(args...)
ignore_derivatives() do
push!(history, err)
return err
# initialize log vector
log_vec = Float32[]
# use function above to create logging loss function
newloss = logged_loss(loss, log_vec)
# run the training
Flux.train!(newloss, Flux.params(W, b), train_data, opt)
At this point, log_vec should include a record of return values from loss function. This is a rough solution, which uses annoying global variables. Interpreting the loss return values depends also on the nature of the optimizer. For my test, there was one call per epoch and it returned a decreasing loss until convergence. [This answer incorporates suggestions from #darsnack]
Note, since the log_vec is incorporated into the loss function, to clear the log, it must not be reassigned but clear!ed with clear!(log_vec).
Adding to #Dan's answer, you can also augment your loss function with logging on the fly using the do syntax:
using ChainRules
loss_history = Float32[]
Flux.train!(Flux.params(model), train_data_loader, opt) do x, y
err = loss(x, y)
ChainRules.ignore_derivatives() do
push!(loss_history, err)
return err
You would need to write your own version of Flux.train! using withgradient instead of gradient function. withgradient gives you the output of the loss (or a function which you are differentiating to be more precise). Flux.train! (https://github.com/FluxML/Flux.jl/blob/8bc0c35932c4a871ac73b42e39146cd9bbb1d446/src/optimise/train.jl#L123) is literaly few lines of code, therefore updating it to your version is very easy.
I've been writing this quicksort function in R trying to incorporate a partition function I've created as well. However, I've been encountering bugs when comparing p and r. It keeps telling me my argument is of length 0, however, I thought I declared the p and r objects when I initially called the quicksort function.
partition <- function(input,p, r){
pivot = input[r]
while(input[p]<pivot) {p<-p+1}
while(input[r]>pivot) {r<-r-1}
if(input[p]==input[r]) {p<-p+1}
else if (p<r){
tmp <- input[p]
input[p] = input[r]
input[r] = tmp
quicksort<- function(input,p,r){
j<- partition(input,p,r)
input <- quicksort(input,p,j-1)
input <- quicksort(input,j+1,r)
input <- c(500,700,800,100,300,200,900,400,1000,600)
The error in question is caused because input[p] is of length zero. Why? Because in this instance input is NULL. input isn't NULL for the first few goes, so what would make it NULL?
Your quicksort function is designed to take an input, change it (if p<r), and then output it. But, you've left out the output step. If p<r then this is taken care of implicitly by the last input <- ... line, but if not then the function doesn't do anything and just returns NULL.
The output from one call to quicksort is the input to the next, and so this NULL propagates and breaks the next call.
Recursive functions are beautiful but often frustrating to debug. I recommend liberally sprinkling print() statements around while you're still developing it so you can see what it's doing more easily.
I am working with a program which includes many function calls inside a for loop. For short, it is something like this:
function something()
timer = zeros(NSTEP);
for it = 1:NSTEP # time steps
tic = time_ns();
Threads.#threads for p in 1:2 # Star parallel of two sigma functions
arg_in_sig[p] = func_sig[p](arg_in_sig[p]);
Threads.#threads for p in 1:2
arg_in_vel[p] = func_vel[p](arg_in_vel[p])
timer[i] = toc-tic;
end # time loop
What I am trying to do, is to meassure the time that takes to perform on each loop iteration, saving the result in an output file called "timer.txt". The thing is that it doesn't work.
It saves a file with all zeros on it (except two or three values, which is more confusing).
I made a toy example like:
using DelimitedFiles;
function test()
for i=1:1000
tic = time_ns();
C = rand(20,20)*rand(20,20);
toc = time_ns();
a[i] = toc-tic;
return a;
and these actually works (it saves fine!). Is there something to do with the fact that I am implementing Threads.#threads?. What can be happening between writedlm() and time_ns() in my program?
Any help would be much apreciated!
You are iterating over it but try to save by:
timer[i] = toc-tic;
while it should be
timer[it] = toc-tic;
Perhaps you have some i in global scope and hence the code still works.
Additionally locking the thread and immediately unlocking does not seem to make much sense. Moreover, when you iterate over p which happens to be also index of the Vector cell where you save the results there is no need to use the locking mechanism at all (unless you are calling some functions that depend on a global state).
I'm creating an function to be minimized, basically a function of x1, returning value cmc. BTW I like it return some intermediate value w for later use. I just learnt to create functions return multiple values, you have to make a list, or setClass (which I'm not very clear about, so I did not use it). The executable code is
t1=1; t2=0.6; x2=1;
f<-function(x) c(x/(t2+x),-t1*x/(t2+x)^2)
The output is not very decent but acceptable. The problem is I cannot optimize a function with such output. So now I am doing
t1=1; t2=0.6; x2=1;
f<-function(x) c(x/(t2+x),-t1*x/(t2+x)^2)
very redundant. It's ok when the problem is simple as this one, but obviously not good when things become complicated. Is there a way to create a function with multiple return values and allows you to set a prime value to be optimized? I remember some base functions are like that, give you various output but you can still work with a prime value.
You can just wrap this function in a different function that only returns the intended output. such as
t1=1; t2=0.6; x2=1;
f<-function(x) c(x/(t2+x),-t1*x/(t2+x)^2)
> optimize(function(x) phi_c(x)[[1]], lower = 0, upper = 5)
[1] 4.999922
[1,] 37.12268
I want to add a loss function to torch that calculates the edit distance between predicted and target values.
Is there an easy way to implement this idea?
Or do I have to write my own class with backward and forward functions?
If your criterion can be represented as a composition of existing modules and criteria, it's a good idea to simply construct such composition using containers. The only problem is that standard containers are designed to work with modules only, not criteria. The difference is in :forward method signature:
criterion:forward(input, target)
Luckily, we are free to define our own container which is able work with criteria too. For example, sequential:
local GeneralizedSequential, _ = torch.class('nn.GeneralizedSequential', 'nn.Sequential')
function GeneralizedSequential:forward(input, target)
return self:updateOutput(input, target)
function GeneralizedSequential:updateOutput(input, target)
local currentOutput = input
for i=1,#self.modules do
currentOutput = self.modules[i]:updateOutput(currentOutput, target)
self.output = currentOutput
return currentOutput
Below is an illustration of how to implement nn.CrossEntropyCriterion having this generalized sequential container:
function MyCrossEntropyCriterion(weights)
criterion = nn.GeneralizedSequential()
return criterion
Check whether everything is correct:
output = torch.rand(3,3)
target = torch.Tensor({1, 2, 3})
mycrit = MyCrossEntropyCriterion()
-- print(mycrit)
print(mycrit:forward(output, target))
print(mycrit:backward(output, target))
crit = nn.CrossEntropyCriterion()
-- print(crit)
print(crit:forward(output, target))
print(crit:backward(output, target))
Just to add to the accepted answer, you have to be careful that the loss function you define (edit distance in your case) is differentiable with respect to the network parameters.
I have read that it is in principle possible to convert a recursive function to an iterative function. I have some bunch of functions calling each other. I constructed the structure of the code looking at my flowchart and it was kind of obvious to do it by recursive style. It runs good for small size problems but gives segmentation fault for bigger scale. So I am trying to switch to iterative style but I cannot imagine a way to do it technically since the branching structure confuses me. Can someone give me a clue to handle it? The code is something like that in python:
def main_function(parameters):
if condition0:
if condition1:
if condition2:
return function1(parameters)
return function2(parameters)
return function1(parameters)
return function2(parameters)
def function1(parameters):
if condition3:
return function3(parameters) ### yet another function.. so messed up? :-(((
return main_function(parameters)
def function2(parameters):
if condition4:
return main_function(parameters)
return function1(parameters)
def function3(parameters):
if condition5
if condition6:
return function3(parameters)
return main_function(parameters)
return RESULTS # The only way out!
Any idea would be greatly appreciated, thank you very much in advance.
Since every return statement that you've shown is essentially a return some_other_function(), it seems that a state machine would be a natural way to model this. There would be a state corresponding to each function, and the return statements would become state transitions.
Since every recursive call is initiated in return statements. You don't need to hold up the old stack. For example, when function1() calls return function3(), function1 stack can be removed. This way you won't get RuntimeError: maximum recursion depth exceeded.
You can achieve this by returning the successive function to call with parameters, instead of calling recursively.
def main_function(parameters):
if condition0:
if condition1:
if condition2:
return function1, parameters # return function to call next with arguments
return function2, parameters
return function1, parameters
return function2, parameters
You should change the other functions to a similar way. Now you can call the main_function() as follows:
next_function, next_fun_param = main_function(parameters)
while hasattr(next_function, '__call__')
next_function, next_fun_param = next_function(next_fun_param)
# got the RESULT