In PyTorch, you commonly have to zero our the gradients before doing back propagation. Is this the case in Flux? If so, what is the programatic way of doing this?
tl;dr
No, there is no need.
explanation
Flux used to use Tracker, a differentiation system in which each tracked array may hold a gradient. I think this is a similar design to pytorch. Back-propagating twice can lead to the problem which zeroing is intended to avoid (although the defaults try to protect you):
julia> using Tracker
julia> x_tr = Tracker.param([1 2 3])
Tracked 1×3 Matrix{Float64}:
1.0 2.0 3.0
julia> y_tr = sum(abs2, x_tr)
14.0 (tracked)
julia> Tracker.back!(y_tr, 1; once=false)
julia> x_tr.grad
1×3 Matrix{Float64}:
2.0 4.0 6.0
julia> Tracker.back!(y_tr, 1; once=false) # by default (i.e. with once=true) this would be an error
julia> x_tr.grad
1×3 Matrix{Float64}:
4.0 8.0 12.0
Now it uses Zygote, which does not use tracked array types. Instead, the evaluation to be traced must happen with the call to Zygote.gradient, it can then see and manipulate the source code to write new code for the gradient. Repeated calls to this generate the same gradients each time; there is no stored state to need cleaning up.
julia> using Zygote
julia> x = [1 2 3] # an ordinary Array
1×3 Matrix{Int64}:
1 2 3
julia> Zygote.gradient(x -> sum(abs2, x), x)
([2 4 6],)
julia> Zygote.gradient(x -> sum(abs2, x), x)
([2 4 6],)
julia> y, bk = Zygote.pullback(x -> sum(abs2, x), x);
julia> bk(1.0)
([2.0 4.0 6.0],)
julia> bk(1.0)
([2.0 4.0 6.0],)
Tracker can also be used this way, rather than handling param and back! yourself:
julia> Tracker.gradient(x -> sum(abs2, x), [1, 2, 3])
([2.0, 4.0, 6.0] (tracked),)
Related
I am a newbie to Julia and Flux with some experience in Tensorflow Keras and python. I tried to use the Flux.withgradient command to write a user-defined training function with more flexibility. Here is the training part of my code:
loss, grad = Flux.withgradient(modelDQN.evalParameters) do
qEval = modelDQN.evalModel(evalInput)
Flux.mse(qEval, qTarget)
end
Flux.update!(modelDQN.optimizer, modelDQN.evalParameters, grad)
This code works just fine. But if I put the command qEval = modelDQN.evalModel(evalInput) outside the do end loop, as follows:
qEval = modelDQN.evalModel(evalInput)
loss, grad = Flux.withgradient(modelDQN.evalParameters) do
Flux.mse(qEval, qTarget)
end
Flux.update!(modelDQN.optimizer, modelDQN.evalParameters, grad)
The model parameters will not be updated. As far as I know, the do end loop works as an anonymous function that takes 0 arguments. Then why do we need the command qEval = modelDQN.evalModel(evalInput) inside the loop to get the model updated?
The short answer is that anything to be differentiated has to happen inside the (anonymous) function which you pass to gradient (or withgradient), because this is very much not a standard function call -- Zygote (Flux's auto-differentiation library) traces its execution to compute the derivative, and can't transform what it can't see.
Longer, this is Zygote's "implicit" mode, which relies on global references to arrays. The simplest use is something like this:
julia> using Zygote
julia> x = [2.0, 3.0];
julia> g = gradient(() -> sum(x .^ 2), Params([x]))
Grads(...)
julia> g[x] # lookup by objectid(x)
2-element Vector{Float64}:
4.0
6.0
If you move some of that calculation outside, then you make a new array y with a new objectid. Julia has no memory of where this came from, it is completely unrelated to x. They are ordinary arrays, not a special tracked type.
So if you refer to y in the gradient, Zygote cannot infer how this depends on x:
julia> y = x .^ 2 # calculate this outside of gradient
2-element Vector{Float64}:
4.0
9.0
julia> g2 = gradient(() -> sum(y), Params([x]))
Grads(...)
julia> g2[x] === nothing # represents zero
true
Zygote doesn't have to be used in this way. It also has an "explicit" mode which does not rely on global references. This is perhaps less confusing:
julia> gradient(x1 -> sum(x1 .^ 2), x) # x1 is a local variable
([4.0, 6.0],)
julia> gradient(x1 -> sum(y), x) # sum(y) is obviously indep. x1
(nothing,)
julia> gradient((x1, y1) -> sum(y1), x, y)
(nothing, Fill(1.0, 2))
Flux is in the process of changing to use this second form. On v0.13.9 or later, something like this ought to work:
opt_state = Flux.setup(modelDQN.optimizer, modelDQN) # do this once
loss, grads = Flux.withgradient(modelDQN.model) do m
qEval = m(evalInput) # local variable m
Flux.mse(qEval, qTarget)
end
Flux.update!(opt_state, modelDQN.model, grads[1])
I create a new struct called HousingData, and also define function such as iterate and length. However, when I use the function collect for my HousingData object, I run into the following error.
TypeError: in typeassert, expected Integer, got a value of type Float64
import Base: length, size, iterate
struct HousingData
x
y
batchsize::Int
shuffle::Bool
num_instances::Int
function HousingData(
x, y; batchsize::Int=100, shuffle::Bool=false, dtype::Type=Array{Float64})
new(convert(dtype,x),convert(dtype,y),batchsize,shuffle,size(y)[end])
end
end
function length(d::HousingData)
return ceil(d.num_instances/d.batchsize)
end
function iterate(d::HousingData, state=ifelse(
d.shuffle, randperm(d.num_instances), collect(1:d.num_instances)))
if(length(state)==0)
return nothing
end
return ((d.x[:,state[1]],d.y[:,state[1]]),state[2:end])
end
x1 = randn(5, 100); y1 = rand(1, 100);
obj = HousingData(x1,y1; batchsize=20)
collect(obj)
There are multiple problems in your code. The first one is related to length not returning an integer, but rather a float. This is explained by the behavior of ceil:
julia> ceil(3.8)
4.0 # Notice: 4.0 (Float64) and not 4 (Int)
You can easily fix this:
function length(d::HousingData)
return Int(ceil(d.num_instances/d.batchsize))
end
Another problem lies in the logic of your iteration function, which is not consistent with the advertised length. To take a smaller example than yours:
julia> x1 = [i+j/10 for i in 1:2, j in 1:6]
2×6 Array{Float64,2}:
1.1 1.2 1.3 1.4 1.5 1.6
2.1 2.2 2.3 2.4 2.5 2.6
# As an aside, unless you really want to work with 1xN matrices
# it is more idiomatic in Julia to use 1D Vectors in such situations
julia> y1 = [Float64(j) for i in 1:1, j in 1:6]
1×6 Array{Float64,2}:
1.0 2.0 3.0 4.0 5.0 6.0
julia> obj = HousingData(x1,y1; batchsize=3)
HousingData([1.1 1.2 … 1.5 1.6; 2.1 2.2 … 2.5 2.6], [1.0 2.0 … 5.0 6.0], 3, false, 6)
julia> length(obj)
2
julia> for (i, e) in enumerate(obj)
println("$i -> $e")
end
1 -> ([1.1, 2.1], [1.0])
2 -> ([1.2, 2.2], [2.0])
3 -> ([1.3, 2.3], [3.0])
4 -> ([1.4, 2.4], [4.0])
5 -> ([1.5, 2.5], [5.0])
6 -> ([1.6, 2.6], [6.0])
The iterator produces 6 elements, whereas the length of this object is only 2. This explains why collect errors out:
julia> collect(obj)
ERROR: ArgumentError: destination has fewer elements than required
Knowing your code, you're probably the best person to fix its logic.
Newbie Julia question here. Given two arrays,
W = [randn(3,2), randn(3,2)]
b = [randn(3), randn(3)]
I would like to do a "nested broadcast" along the lines of,
W .+ b = [W[1].+b[1], W[2].+b[2]]
So far the best I've been able to come up with is,
[Wi.+bi (Wi,bi) for zip(W,b)]
Coming from a Python background this feels sacrilegious. Is there a better way to do this in Julia?
You could do something like the following:
julia> W = [randn(3,2), randn(3,2)]
2-element Array{Array{Float64,2},1}:
[0.39179718902868116 -0.5387622679356612; -0.594274465053327 0.018804631512093436; -2.273706742420988 -0.4638617400026042]
[0.3249960563405678 -0.4877554417492699; 0.5036437919340767 1.3172770503034696; 0.03501532820428975 -0.2675024677340758]
julia> b = [randn(3), randn(3)]
2-element Array{Array{Float64,1},1}:
[1.2571527266220441, 0.21599608118129476, 0.21498843153804936]
[-0.528960345932853, 0.5610435189953311, -0.8636370930615718]
# A helper function which broadcasts once
julia> s_(Wi, bi) = Wi .+ bi
s_ (generic function with 1 method)
# Broadcast the helper function
julia> s_.(W, b)
2-element Array{Array{Float64,2},1}:
[1.6489499156507252 0.718390458686383; -0.3782783838720323 0.2348007126933882; -2.0587183108829388 -0.24887330846455485]
[-0.20396428959228524 -1.016715787682123; 1.0646873109294077 1.8783205692988008; -0.828621764857282 -1.1311395607956476]
As mentioned in comments, you can use one of the available user-definable infix operators to name the helper function, which allows for a nicer syntax (the particular symbol used below can be obtained by typing \oplus then Tab):
julia> ⊕(x, y) = x .+ y
⊕ (generic function with 1 method)
julia> W .⊕ b
2-element Array{Array{Float64,2},1}:
[1.6489499156507252 0.718390458686383; -0.3782783838720323 0.2348007126933882; -2.0587183108829388 -0.24887330846455485]
[-0.20396428959228524 -1.016715787682123; 1.0646873109294077 1.8783205692988008; -0.828621764857282 -1.1311395607956476]
Or - again as mentioned in comments - if you don't feel like explicitly declaring a helper function:
julia> ((Wi, bi)->Wi.+bi).(W, b)
2-element Array{Array{Float64,2},1}:
[1.6489499156507252 0.718390458686383; -0.3782783838720323 0.2348007126933882; -2.0587183108829388 -0.24887330846455485]
[-0.20396428959228524 -1.016715787682123; 1.0646873109294077 1.8783205692988008; -0.828621764857282 -1.1311395607956476]
(I personally have more difficulty reading that last version, but YMMV)
When you want to broadcast broadcast, then broadcast broadcast:
julia> broadcast.(+, W, b)
2-element Array{Array{Float64,2},1}:
[-0.7364111670769904 0.010994354421031916; -0.9010128415786036 0.22868802910609998; 1.2030371118617933 0.21305414210853912]
[0.19183885867446926 0.5362077496502086; 1.5909421118115665 0.1235808501390212; 1.5190965380769597 0.1883638848487652]
julia> [W[1].+b[1], W[2].+b[2]]
2-element Array{Array{Float64,2},1}:
[-0.7364111670769904 0.010994354421031916; -0.9010128415786036 0.22868802910609998; 1.2030371118617933 0.21305414210853912]
[0.19183885867446926 0.5362077496502086; 1.5909421118115665 0.1235808501390212; 1.5190965380769597 0.1883638848487652]
Suppose we have to take multiple input in one line in Python 3 then:-
1st method:-
x, y = input(), input()
2nd method:-
x, y = input().split()
3rd method:-
Using list comprehension
x, y = [int(x) for x in [x, y]]
4th method:-
x, y = map(int, input().split())
So these are the methods I know in python 3.
Can anyone tell me the alternate code in Julia?
readdlm(IOBuffer(readline()))
The best simple parser for all occasions is readdlm.
It will provide you processing any user input as an array and hence will be most robust for any circumstances:
julia> using DelimitedFiles
julia> readdlm(IOBuffer(readline()))
z b c
1×3 Array{Any,2}:
"z" "b" "c"
julia> readdlm(IOBuffer(readline()))
1 2
1×2 Array{Float64,2}:
1.0 2.0
Since it is an Array the same multi-argument assignment will work as in Python
julia> x, y = readdlm(IOBuffer(readline()))
1 2 3
1×3 Array{Float64,2}:
1.0 2.0 3.0
julia> x, y
(1.0, 2.0)
As we can't directly use input function I implemented like this in Julia.
function input()
x, y= readline(stdin), readline(stdin)
end
So I hope you liked this one.
I am trying to port some code and now I've hit a sticky bit. The original code is in C++. I need to port a union that has two 32 bit ints (in an array) and a double.
So far I have:
I1 = UInt32(56) # arbitrary integer values for example
I2 = UInt32(1045195987)
# do transforms on I1 and I2 as per the code I'm porting
A = bits(I1)
B = bits(I2)
return parse(Float64, string(A,B))
Is this the way to do it? The string operation seems expensive. Any advice appreciated.
I also come from mostly C/C++ programming, and this is what I do to handle the problem:
First, create an immutable type with two UInt32 elements:
immutable MyType
a::UInt32
b::UInt32
end
Then you can convert a vector of Float64 to that type with reinterpret.
For example:
julia> x = [1.5, 2.3]
2-element Array{Float64,1}:
1.5
2.3
julia> immutable MyType ; a::UInt32 ; b::UInt32 ; end
julia> y = reinterpret(MyType, x)
2-element Array{MyType,1}:
MyType(0x00000000,0x3ff80000)
MyType(0x66666666,0x40026666)
julia> x[1]
1.5
julia> y[1]
MyType(0x00000000,0x3ff80000)
julia> y[1].a
0x00000000
julia> y[1].b
0x3ff80000
Note: the two vectors still point to the same memory, so you can even update elements, using either type.
julia> x[1] = 10e91
1.0e92
julia> y[1].a
0xbf284e24
julia> y[1].b
0x53088ba3
julia> y[1] = MyType(1,2)
MyType(0x00000001,0x00000002)
julia> x[1]
4.2439915824e-314