Add my custom loss function to torch - torch

I want to add a loss function to torch that calculates the edit distance between predicted and target values.
Is there an easy way to implement this idea?
Or do I have to write my own class with backward and forward functions?

If your criterion can be represented as a composition of existing modules and criteria, it's a good idea to simply construct such composition using containers. The only problem is that standard containers are designed to work with modules only, not criteria. The difference is in :forward method signature:
module:forward(input)
criterion:forward(input, target)
Luckily, we are free to define our own container which is able work with criteria too. For example, sequential:
local GeneralizedSequential, _ = torch.class('nn.GeneralizedSequential', 'nn.Sequential')
function GeneralizedSequential:forward(input, target)
return self:updateOutput(input, target)
end
function GeneralizedSequential:updateOutput(input, target)
local currentOutput = input
for i=1,#self.modules do
currentOutput = self.modules[i]:updateOutput(currentOutput, target)
end
self.output = currentOutput
return currentOutput
end
Below is an illustration of how to implement nn.CrossEntropyCriterion having this generalized sequential container:
function MyCrossEntropyCriterion(weights)
criterion = nn.GeneralizedSequential()
criterion:add(nn.LogSoftMax())
criterion:add(nn.ClassNLLCriterion(weights))
return criterion
end
Check whether everything is correct:
output = torch.rand(3,3)
target = torch.Tensor({1, 2, 3})
mycrit = MyCrossEntropyCriterion()
-- print(mycrit)
print(mycrit:forward(output, target))
print(mycrit:backward(output, target))
crit = nn.CrossEntropyCriterion()
-- print(crit)
print(crit:forward(output, target))
print(crit:backward(output, target))

Just to add to the accepted answer, you have to be careful that the loss function you define (edit distance in your case) is differentiable with respect to the network parameters.

Related

Is there a way to print loss from Flux.train?

I'm trying to train a UNet in Julia with the help of Flux.
Flux.train!(loss, Flux.params(model), train_data_loader, opt)
batch_loss = loss(train_data, train_targets)
where the loss is
logitcrossentropy
and train_data_loader is
train_data_loader = DataLoader((train_data |> device, train_targets |> device), batchsize=batch_size, shuffle=true)
I dont understand how to take the loss from Flux.train out for printing loss (is that validation loss?). Evalcb will also trigger a call to calculate loss, so its not different. I was to skip extra calculation.
So What I did is call the loss function again and store it in a variable then print it per batch. Is there a way to print loss from Flux.train() instead of calling loss again?
Instead of altering train! like #Tomas suggested, the loss function can be instrumented to log the return value. Printing stuff during calculation sounds like a bad idea for decent performance, so I've made an example where the loss is logged into a global vector:
using ChainRulesCore
# returns another loss function which is the same as the function
# in parameter, but push!es the return value into global variable
# `loss_log_vec`
function logged_loss(lossfn, history)
return function _loss(args...)
err = lossfn(args...)
ignore_derivatives() do
push!(history, err)
end
return err
end
end
# initialize log vector
log_vec = Float32[]
# use function above to create logging loss function
newloss = logged_loss(loss, log_vec)
# run the training
Flux.train!(newloss, Flux.params(W, b), train_data, opt)
At this point, log_vec should include a record of return values from loss function. This is a rough solution, which uses annoying global variables. Interpreting the loss return values depends also on the nature of the optimizer. For my test, there was one call per epoch and it returned a decreasing loss until convergence. [This answer incorporates suggestions from #darsnack]
Note, since the log_vec is incorporated into the loss function, to clear the log, it must not be reassigned but clear!ed with clear!(log_vec).
Adding to #Dan's answer, you can also augment your loss function with logging on the fly using the do syntax:
using ChainRules
loss_history = Float32[]
Flux.train!(Flux.params(model), train_data_loader, opt) do x, y
err = loss(x, y)
ChainRules.ignore_derivatives() do
push!(loss_history, err)
end
return err
end
You would need to write your own version of Flux.train! using withgradient instead of gradient function. withgradient gives you the output of the loss (or a function which you are differentiating to be more precise). Flux.train! (https://github.com/FluxML/Flux.jl/blob/8bc0c35932c4a871ac73b42e39146cd9bbb1d446/src/optimise/train.jl#L123) is literaly few lines of code, therefore updating it to your version is very easy.

porting python class to Julialang

I am seeing that Julia explicitly does NOT do classes... and I should instead embrace mutable structs.. am I going down the correct path here?? I diffed my trivial example against an official flux library but cannot gather how do I reference self like a python object.. is the cleanest way to simply pass the type as a parameter in the function??
Python
# Dense Layer
class Layer_Dense
def __init__(self, n_inputs, n_neurons):
self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
def forward(self, inputs):
pass
My JuliaLang version so far
mutable struct LayerDense
num_inputs::Int64
num_neurons::Int64
weights
biases
end
function forward(layer::LayerDense, inputs)
layer.weights = 0.01 * randn(layer.num_inputs, layer.num_neurons)
layer.biases = zeros((1, layer.num_neurons))
end
The flux libraries version of a dense layer... which looks very different to me.. and I do not know what they're doing or why.. like where is the forward pass call, is it here in flux just named after the layer Dense???
source : https://github.com/FluxML/Flux.jl/blob/b78a27b01c9629099adb059a98657b995760b617/src/layers/basic.jl#L71-L111
struct Dense{F, M<:AbstractMatrix, B}
weight::M
bias::B
σ::F
function Dense(W::M, bias = true, σ::F = identity) where {M<:AbstractMatrix, F}
b = create_bias(W, bias, size(W,1))
new{F,M,typeof(b)}(W, b, σ)
end
end
function Dense(in::Integer, out::Integer, σ = identity;
initW = nothing, initb = nothing,
init = glorot_uniform, bias=true)
W = if initW !== nothing
Base.depwarn("keyword initW is deprecated, please use init (which similarly accepts a funtion like randn)", :Dense)
initW(out, in)
else
init(out, in)
end
b = if bias === true && initb !== nothing
Base.depwarn("keyword initb is deprecated, please simply supply the bias vector, bias=initb(out)", :Dense)
initb(out)
else
bias
end
return Dense(W, b, σ)
end
This is an equivalent of your Python code in Julia:
mutable struct Layer_Dense
weights::Matrix{Float64}
biases::Matrix{Float64}
Layer_Dense(n_inputs::Integer, n_neurons::Integer) =
new(0.01 * randn(n_inputs, n_neurons),
zeros((1, n_neurons)))
end
forward(ld::Layer_Dense, inputs) = nothing
What is important here:
here I create an inner constructor only, as outer constructor is not needed; as opposed in the Flux.jl code you have linked the Dense type defines both inner and outer constructors
in python forward function does not do anything, so I copied it in Julia (your Julia code worked a bit differently); note that instead of self one should pass an instance of the object to the function as the first argument (and add ::Layer_Dense type signature so that Julia knows how to correctly dispatch it)
similarly in Python you store only weights and biases in the class, I have reflected this in the Julia code; note, however, that for performance reasons it is better to provide an explicit type of these two fields of Layer_Dense struct
like where is the forward pass call
In the code you have shared only constructors of Dense object are defined. However, in the lines below here and here the Dense type is defined to be a functor.
Functors are explained here (in general) and in here (more specifically for your use case)

How to use method reference in Java 8 for Map merge?

I have following 2 forms of calling a collect operation, both return same result, but I still cannot depend fully on method references and need a lambda.
<R> R collect(Supplier<R> supplier,
BiConsumer<R,? super T> accumulator,
BiConsumer<R,R> combiner)
For this consider the following stream consisting on 100 random numbers
List<Double> dataList = new Random().doubles().limit(100).boxed()
.collect(Collectors.toList());
1) Following example uses pure lambdas
Map<Boolean, Integer> partition = dataList.stream()
.collect(() -> new ConcurrentHashMap<Boolean, Integer>(),
(map, x) ->
{
map.merge(x < 0.5 ? Boolean.TRUE : Boolean.FALSE, 1, Integer::sum);
}, (map, map2) ->
{
map2.putAll(map);
});
2) Following tries to use method references but 2nd argument still requires a lambda
Map<Boolean, Integer> partition2 = dataList.stream()
.collect(ConcurrentHashMap<Boolean, Integer>::new,
(map, x) ->
{
map.merge(x < 0.5 ? Boolean.TRUE : Boolean.FALSE, 1, Integer::sum);
}, Map::putAll);
How can I rewrite 2nd argument of collect method in java 8 to use method reference instead of a lambda for this example?
System.out.println(partition.toString());
System.out.println(partition2.toString());
{false=55, true=45}
{false=55, true=45}
A method reference is a handy tool if you have an existing method doing exactly the intended thing. If you need adaptations or additional operations, there is no special syntax for method references to support that, except, when you consider lambda expressions to be that syntax.
Of course, you can create a new method in your class doing the desired thing and create a method reference to it and that’s the right way to go when the complexity of the code raises, as then, it will get a meaningful name and become testable. But for simple code snippets, you can use lambda expressions, which are just a simpler syntax for the same result. Technically, there is no difference, except that the compiler generated method holding the lambda expression body will be marked as “synthetic”.
In your example, you can’t even use Map::putAll as merge function, as that would overwrite all existing mappings of the first map instead of merging the values.
A correct implementation would look like
Map<Boolean, Integer> partition2 = dataList.stream()
.collect(HashMap::new,
(map, x) -> map.merge(x < 0.5, 1, Integer::sum),
(m1, m2) -> m2.forEach((k, v) -> m1.merge(k, v, Integer::sum)));
but you don’t need to implement it by yourself. There are appropriate built-in collectors already offered in the Collectors class:
Map<Boolean, Long> partition2 = dataList.stream()
.collect(Collectors.partitioningBy(x -> x < 0.5, Collectors.counting()));

Parametric Type Creation

I'm struggling to understand parametric type creation in julia. I know that I can create a type with the following:
type EconData
values
dates::Array{Date}
colnames::Array{ASCIIString}
function EconData(values, dates, colnames)
if size(values, 1) != size(dates, 1)
error("Date/data dimension mismatch.")
end
if size(values, 2) != size(colnames, 2)
error("Name/data dimension mismatch.")
end
new(values, dates, colnames)
end
end
ed1 = EconData([1;2;3], [Date(2014,1), Date(2014,2), Date(2014,3)], ["series"])
However, I can't figure out how to specify how values will be typed. It seems reasonable to me to do something like
type EconData{T}
values::Array{T}
...
function EconData(values::Array{T}, dates, colnames)
...
However, this (and similar attempts) simply produce and error:
ERROR: `EconData{T}` has no method matching EconData{T}(::Array{Int64,1}, ::Array{Date,1}, ::Array{ASCIIString,2})
How can I specify the type of values?
The answer is that things get funky with parametric types and inner constructors - in fact, I think its probably the most confusing thing in Julia. The immediate solution is to provide a suitable outer constructor:
using Dates
type EconData{T}
values::Vector{T}
dates::Array{Date}
colnames::Array{ASCIIString}
function EconData(values, dates, colnames)
if size(values, 1) != size(dates, 1)
error("Date/data dimension mismatch.")
end
if size(values, 2) != size(colnames, 2)
error("Name/data dimension mismatch.")
end
new(values, dates, colnames)
end
end
EconData{T}(v::Vector{T},d,n) = EconData{T}(v,d,n)
ed1 = EconData([1,2,3], [Date(2014,1), Date(2014,2), Date(2014,3)], ["series"])
What also would have worked is to have done
ed1 = EconData{Int}([1,2,3], [Date(2014,1), Date(2014,2), Date(2014,3)], ["series"])
My explanation might be wrong, but I think the probably is that there is no parametric type constructor method made by default, so you have to call the constructor for a specific instantiation of the type (my second version) or add the outer constructor yourself (first version).
Some other comments: you should be explicit about dimensions. i.e. if all your fields are vectors (1D), use Vector{T} or Array{T,1}, and if their are matrices (2D) use Matrix{T} or Array{T,2}. Make it parametric on the dimension if you need to. If you don't, slow code could be generated because functions using this type aren't really sure about the actual data structure until runtime, so will have lots of checks.

How can this imperative code be rewritten to be more functional?

I found an answer on SO that explained how to write a randomly weighted drop system for a game. I would prefer to write this code in a more functional-programming style but I couldn't figure out a way to do that for this code. I'll inline the pseudo code here:
R = (some random int);
T = 0;
for o in os
T = T + o.weight;
if T > R
return o;
How could this be written in a style that's more functional? I am using CoffeeScript and underscore.js, but I'd prefer this answer to be language agnostic because I'm having trouble thinking about this in a functional way.
Here are two more functional versions in Clojure and JavaScript, but the ideas here should work in any language that supports closures. Basically, we use recursion instead of iteration to accomplish the same thing, and instead of breaking in the middle we just return a value and stop recursing.
Original pseudo code:
R = (some random int);
T = 0;
for o in os
T = T + o.weight;
if T > R
return o;
Clojure version (objects are just treated as clojure maps):
(defn recursive-version
[r objects]
(loop [t 0
others objects]
(let [obj (first others)
new_t (+ t (:weight obj))]
(if (> new_t r)
obj
(recur new_t (rest others))))))
JavaScript version (using underscore for convenience).
Be careful, because this could blow out the stack.
This is conceptually the same as the clojure version.
var js_recursive_version = function(objects, r) {
var main_helper = function(t, others) {
var obj = _.first(others);
var new_t = t + obj.weight;
if (new_t > r) {
return obj;
} else {
return main_helper(new_t, _.rest(others));
}
};
return main_helper(0, objects);
};
You can implement this with a fold (aka Array#reduce, or Underscore's _.reduce):
An SSCCE:
items = [
{item: 'foo', weight: 50}
{item: 'bar', weight: 35}
{item: 'baz', weight: 15}
]
r = Math.random() * 100
{item} = items.reduce (memo, {item, weight}) ->
if memo.sum > r
memo
else
{item, sum: memo.sum + weight}
, {sum: 0}
console.log 'r:', r, 'item:', item
You can run it many times at coffeescript.org and see that the results make sense :)
That being said, i find the fold a bit contrived, as you have to remember both the selected item and the accumulated weight between iterations, and it doesn't short-circuit when the item is found.
Maybe a compromise solution between pure FP and the tedium of reimplementing a find algorithm can be considered (using _.find):
total = 0
{item} = _.find items, ({weight}) ->
total += weight
total > r
Runnable example.
I find (no pun intended) this algorithm much more accessible than the first one (and it should perform better, as it doesn't create intermediate objects, and it does short-circuiting).
Update/side-note: the second algorithm is not "pure" because the function passed to _.find is not referentially transparent (it has the side effect of modifying the external total variable), but the whole of the algorithm is referentially transparent. If you were to encapsulate it in a findItem = (items, r) -> function, the function will be pure and will always return the same output for the same input. That's a very important thing, because it means that you can get the benefits of FP while using some non-FP constructs (for performance, readability, or whatever reason) under the hoods :D
I think the underlying task is randomly selecting 'events' (objects) from array os with a frequency defined by their respective weights. The approach is to map (i.e. search) a random number (with uniform distribution) onto the stairstep cumulative probability distribution function.
With positive weights, their cumulative sum is increasing from 0 to 1. The code you gave us simply searches starting at the 0 end. To maximize speed with repeated calls, pre calculate sums, and order the events so the largest weights are first.
It really doesn't matter whether you search with iteration (looping) or recursion. Recursion is nice in a language that tries to be 'purely functional' but doesn't help understanding the underlying mathematical problem. And it doesn't help you package the task into a clean function. The underscore functions are another way of packaging the iterations, but don't change the basic functionality. Only any and all exit early when the target is found.
For small os array this simple search is sufficient. But with a large array, a binary search will be faster. Looking in underscore I find that sortedIndex uses this strategy. From Lo-Dash (an underscore dropin), "Uses a binary search to determine the smallest index at which the value should be inserted into array in order to maintain the sort order of the sorted array"
The basic use of sortedIndex is:
os = [{name:'one',weight:.7},
{name:'two',weight:.25},
{name:'three',weight:.05}]
t=0; cumweights = (t+=o.weight for o in os)
i = _.sortedIndex(cumweights, R)
os[i]
You can hide the cumulative sum calculation with a nested function like:
osEventGen = (os)->
t=0; xw = (t+=y.weight for y in os)
return (R) ->
i = __.sortedIndex(xw, R)
return os[i]
osEvent = osEventGen(os)
osEvent(.3)
# { name: 'one', weight: 0.7 }
osEvent(.8)
# { name: 'two', weight: 0.25 }
osEvent(.99)
# { name: 'three', weight: 0.05 }
In coffeescript, Jed Clinger's recursive search could be written like this:
foo = (x, r, t=0)->
[y, x...] = x
t += y
return [y, t] if x.length==0 or t>r
return foo(x, r, t)
An loop version using the same basic idea is:
foo=(x,r)->
t=0
while x.length and t<=r
[y,x...]=x # the [first, rest] split
t+=y
y
Tests on jsPerf http://jsperf.com/sortedindex
suggest that sortedIndex is faster when os.length is around 1000, but slower than the simple loop when the length is more like 30.

Resources