Unexpected behavior using BSON: #load in Julia - julia

So, I am training a neural network model using Flux package in Julia. During training, each iteration that acquires an improvement in the model, the latter is saved in my computer.
For this, I use the line:
if acc_te[epoch_i] >= best_acc
#info(" -> New best accuracy! Saving model out to mymodel.bson")
#save "mymodel.bson" m
best_acc = acc_te[epoch_i];
last_improvement = epoch_i;
end
Which I assume is pretty standard (I extracted it from model-Zoo).
At the end of the training, it is fair to assume that the saved model "mymodel.bson" is the best I got. So far so good!.
Now, the problem:
Next morning, I open a terminal and this is what happens (every time is the same):
julia>using Flux;
julia>using BSON: #save
julia>using BSON: #load
julia> #load "mymodel.bson" model
ERROR: KeyError: key :model not found
Stacktrace:
[1] getindex(::Dict{Symbol,Any}, ::Symbol) at ./dict.jl:477
[2] top-level scope at /home/gbrunini/.julia/packages/BSON/XAts7/src/BSON.jl:53
julia>
julia> #maybe try another name
julia> #load "mymodel.bson" someothername
ERROR: KeyError: key :someothername not found
Stacktrace:
[1] getindex(::Dict{Symbol,Any}, ::Symbol) at ./dict.jl:477
[2] top-level scope at /home/gbrunini/.julia/packages/BSON/XAts7/src/BSON.jl:53
julia> #maybe try another name
.....same error...
julia> #maybe try another name
.... same error....
julia> #maybe try another name
.... same error....
julia> #maybe try another name
julia>#load "mymodel.bson" m # eureca! this name works!
Why is this happening? Are there some forbidden names?
Sometimes it works with other names, but I always have to try at least 5-6 different names until I find someone that works! t's getting annoying
What am I doing wrong?
Thanks in advance and stay safe!

The macros BSON.#save and BSON.#load store the variable with the exact given name. You cannot restore it under a different name.
using BSON
x=5
BSON.#save "mydoc.bson" x
This has stored the value 5 together with the name x.
If you try to recover it using wrong name it will not be found:
julia> BSON.#load "mydoc.bson" y
ERROR: KeyError: key :y not found
Rather than that - you need to recover with the original name - looks how the value gets overwritten:
julia> x=999
999
julia> BSON.#load "mydoc.bson" x
julia> x
5
So what to do when you need more flexibility? Use the BSON.parse function that returns a Dict and you can do with it whatever you need:
julia> BSON.parse("mydoc.bson")
Dict{Symbol,Any} with 1 entry:
:x => 5

BSON uses Dicts to save and load
BSON does saving and loading based on a top-level Dict, which usually stores the variables by their name as their key (m in your case).
If you do the loading of your .bson you retrieve that dictionary and again instantiate the respective variables giving them their dict key as their name and their dict value as their value.
Taking a look at the README.md of BSON.jl showcases that already.
So in case you would want more flexibility you could directly save a Dict via BSON (check out the mentioned README.md) and do something like:
if acc_te[epoch_i] >= best_acc
#info(" -> New best accuracy! Saving model out to mymodel.bson")
bson("mymodel.bson", Dict(:model => m, <...other key-value stuff you want to save...>)
best_acc = acc_te[epoch_i];
last_improvement = epoch_i;
end
Bonus / Opinion
BSON offers better interoperability between the languages, but when it comes to saving/loading performance and file size, I found Serialization doing a better job. So just in case you want to check out alternatives depending on your use case.
[edit: fixed typo]

Since the BSON save the data as a dictionary, you can choose the key (keyname string) for the variable(s) you wish to save by supplying them explicitly (model->MNIST_model):
BSON.bson( "./models/MNISTmodel2023b.bson" , MNIST_model = model )
so the variable is 'model' which is save under the keyname MNIST_model which can be inspected
BSON.parse( "./models/MNISTmodel2023b.bson" )
->Dict{Symbol, Any} with 1 entry:
:MNIST_model => Dict{Symbol, Any}(:tag=>"struct", :type=>Dict{Symbol, Any}(:t…
and now you can load it under the new name you provided
#load "./models/MNISTmodel2023b.bson" MNIST_model
and use it as so,
test_res = MNIST_model( xtest_batch )
If you forget the names of the keys for the file, you can always do
dict_bson = BSON.parse( "./models/MNISTmodel2023b.bson" )
string.( collect( keys( dict_bson ) ) )
1-element Vector{String}:
"MNIST_model"

Related

How to explore the structure and dimensions of an object in Octave?

Im Matlab the properties function sounds like a possible valid equivalent to commands in R that acquaint you with a particular object in the working environment, providing information as to its structure (data.frame, matrix, list, vector) and type of variables (character, numeric) (for example, with the R command str()), dimensions (using perhaps the call dim()), and names of the variables (names()).
However, this function is not operational in Octave:
>> properties(data)
warning: the 'properties' function is not yet implemented in Octave
I installed the package dataframe as suggested in a comment on the post linked above:
pkg install -forge dataframe and loaded it pkg load dataframe
But I can't find a way of getting a summary of the structure and dimensions of a datset data.mat in the workspace.
I believe it's a structure consisting of a 4 x 372,550 numerical matrix; two 4 x 46,568 numerical matrices, and a 256 x 1 character matrix. To get this info I had to scroll through many pages of the printout of data.
This info is not available on the Octave IDE, where I get:
Name Class Dimensions
data struc 1 x 1
a far cry from the complexity of the object data.
What is the smart way of getting some this information about an object in the workspace in Octave?
Following up on the first answer provided, here is what I get with whos:
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
data 1x1 7452040 struct
Total is 1 element using 7452040 bytes
This is not particularly informative about what data really contains. In fact, I just found out a way to extract the names inside data:
>> fieldnames(data)
ans =
{
[1,1] = testData
[2,1] = trainData
[3,1] = validData
[4,1] = vocab
}
Now if I call
>> size(data)
ans =
1 1
the output is not very useful. On the other hand, knowing the names of the matrices within data I can do
>> size(data.trainData)
ans =
4 372550
which is indeed informative.
If you type the name of the variable, you'll see information about it. In your case it's a struct, so it'll tell you the field names. Relevant functions are: size, ndims, class, fieldnames, etc.
size(var)
class(var)
etc.
You refer to .mat. Maybe you have a MAT-file, which you can load with load filename. Once loaded you can examine and use the variables in the file.
whos
prints simple information on the variables in memory, most useful to see what variables exist.
Following up on your edited question. This works in Octave:
for s=fieldnames(data)'
s=s{1};
tmp=data.(s);
disp([s,' - ',class(tmp),' - ',mat2str(size(tmp))])
end
It prints basic information of each of the members of the struct. It does assume that data is a 1x1 struct array. Note that a struct can be an array:
data(2).testData = [];
Causes your data struct to be a 2x1 array. This is why size(data) is relevant. class is also important (it's shown in the output of whos. Variables can be of type double (normal arrays), and other numeric types, logical, struct, cell (an array of arrays), or a custom class that you can write yourself.
I highly recommend reading an introductory text on MATLAB/Octave, as it works very differently from R. It's not just a different flavor of language, it's a whole different world.

What does the "Base" keyword mean in Julia?

I saw this example in the Julia language documentation. It uses something called Base. What is this Base?
immutable Squares
count::Int
end
Base.start(::Squares) = 1
Base.next(S::Squares, state) = (state*state, state+1)
Base.done(S::Squares, s) = s > S.count;
Base.eltype(::Type{Squares}) = Int # Note that this is defined for the type
Base.length(S::Squares) = S.count;
Base is a module which defines many of the functions, types and macros used in the Julia language. You can view the files for everything it contains here or call whos(Base) to print a list.
In fact, these functions and types (which include things like sum and Int) are so fundamental to the language that they are included in Julia's top-level scope by default.
This means that we can just use sum instead of Base.sum every time we want to use that particular function. Both names refer to the same thing:
Julia> sum === Base.sum
true
Julia> #which sum # show where the name is defined
Base
So why, you might ask, is it necessary is write things like Base.start instead of simply start?
The point is that start is just a name. We are free to rebind names in the top-level scope to anything we like. For instance start = 0 will rebind the name 'start' to the integer 0 (so that it no longer refers to Base.start).
Concentrating now on the specific example in docs, if we simply wrote start(::Squares) = 1, then we find that we have created a new function with 1 method:
Julia> start
start (generic function with 1 method)
But Julia's iterator interface (invoked using the for loop) requires us to add the new method to Base.start! We haven't done this and so we get an error if we try to iterate:
julia> for i in Squares(7)
println(i)
end
ERROR: MethodError: no method matching start(::Squares)
By updating the Base.start function instead by writing Base.start(::Squares) = 1, the iterator interface can use the method for the Squares type and iteration will work as we expect (as long as Base.done and Base.next are also extended for this type).
I'll grant that for something so fundamental, the explanation is buried a bit far down in the documentation, but http://docs.julialang.org/en/release-0.4/manual/modules/#standard-modules describes this:
There are three important standard modules: Main, Core, and Base.
Base is the standard library (the contents of base/). All modules
implicitly contain using Base, since this is needed in the vast
majority of cases.

julia static field of composite type

In Julia am I allowed to create & use static fields? Let me explain my problem with a simplified example. Let's say we have a type:
type Foo
bar::Dict()
baz::Int
qux::Float64
function Foo(fname,baz_value,qux_value)
dict = JLD.load(fname)["dict"] # It is a simple dictionary loading from a special file
new(dict,baz_value,quz_value)
end
end
Now, As you can see, I load a dictionary from a jld file and store it into the Foo type with the other two variables baz and qux_value. Now, let's say I will create 3 Foo object type.
vars = [ Foo("mydictfile.jld",38,37.0) for i=1:3]
Here, as you can see, all of the Foo objects load the same dictionary. This is a quite big file (~10GB)and I don't want to load it many times. So,
I simply ask that, is there any way in julia so that, I load it just once and all of there 3 types can reach it? (That's way I simply use Static keyword inside the question)
For such a simple question, my approach might look like silly, but as a next step, I make this Foo type iterable and I need to use this dictionary inside the next(d::Foo, state) function.
EDIT
Actually, I've found a way right now. But I want to ask that whether this is a correct or not.
Rather than giving the file name to the FOO constructor, If I load the dictionary into a variable before creating the objects and give the same variable into all of the constructors, I guess all the constructors just create a pointer to the same dictionary rather than creating again and again. Am I right ?
So, modified version will be like that:
dict = JLD.load("mydictfile.jld")["dict"]
vars = [ Foo(dict,38,37.0) for i=1:3]
By the way,I still want to hear if I do the same thing completely inside the Foo type (I mean constructor of it)
You are making the type "too special" by adding the inner constructor. Julia provides default constructors if you do not provide an inner constructor; these just fill in the fields in an object of the new type.
So you can do something like:
immutable Foo{K,V}
bar::Dict{K,V}
baz::Int
qux::Float64
end
dict = JLD.load("mydictfile.jld")["dict"]
vars = [Foo(dict, i, i+1) for i in 1:3]
Note that it was a syntax error to include the parentheses after Dict in the type definition.
The {K,V} makes the Foo type parametric, so that you can make different kinds of Foo type, with different Dict types inside, if necessary. Even if you only use it for a single type of Dict, this will give more efficient code, since the type parameters K and V will be inferred when you create the Foo object. See the Julia manual: http://docs.julialang.org/en/release-0.5/manual/performance-tips/#avoid-fields-with-abstract-containers
So now you can try the code without even having the JLD file available (as we do not, for example):
julia> dict = Dict("a" => 1, "b" => 2)
julia> vars = [Foo(dict, i, Float64(i+1)) for i in 1:3]
3-element Array{Foo{String,Int64},1}:
Foo{String,Int64}(Dict("b"=>2,"a"=>1),1,2.0)
Foo{String,Int64}(Dict("b"=>2,"a"=>1),2,3.0)
Foo{String,Int64}(Dict("b"=>2,"a"=>1),3,4.0)
You can see that it is indeed the same dictionary (i.e. only a reference is actually stored in the type object) by modifying one of them and seeing that the others also change, i.e. that they point to the same dictionary object:
julia> vars[1].bar["c"] = 10
10
julia> vars
3-element Array{Foo{String,Int64},1}:
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),1,2.0)
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),2,3.0)
Foo{String,Int64}(Dict("c"=>10,"b"=>2,"a"=>1),3,4.0)

Immutable dictionary

Is there a way to enforce a dictionary being constant?
I have a function which reads out a file for parameters (and ignores comments) and stores it in a dict:
function getparameters(filename::AbstractString)
f = open(filename,"r")
dict = Dict{AbstractString, AbstractString}()
for ln in eachline(f)
m = match(r"^\s*(?P<key>\w+)\s+(?P<value>[\w+-.]+)", ln)
if m != nothing
dict[m[:key]] = m[:value]
end
end
close(f)
return dict
end
This works just fine. Since i have a lot of parameters, which i will end up using on different places, my idea was to let this dict be global. And as we all know, global variables are not that great, so i wanted to ensure that the dict and its members are immutable.
Is this a good approach? How do i do it? Do i have to do it?
Bonus answerable stuff :)
Is my code even ok? (it is the first thing i did with julia, and coming from c/c++ and python i have the tendencies to do things differently.) Do i need to check whether the file is actually open? Is my reading of the file "julia"-like? I could also readall and then use eachmatch. I don't see the "right way to do it" (like in python).
Why not use an ImmutableDict? It's defined in base but not exported. You use one as follows:
julia> id = Base.ImmutableDict("key1"=>1)
Base.ImmutableDict{String,Int64} with 1 entry:
"key1" => 1
julia> id["key1"]
1
julia> id["key1"] = 2
ERROR: MethodError: no method matching setindex!(::Base.ImmutableDict{String,Int64}, ::Int64, ::String)
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> id2 = Base.ImmutableDict(id,"key2"=>2)
Base.ImmutableDict{String,Int64} with 2 entries:
"key2" => 2
"key1" => 1
julia> id.value
1
You may want to define a constructor which takes in an array of pairs (or keys and values) and uses that algorithm to define the whole dict (that's the only way to do so, see the note at the bottom).
Just an added note, the actual internal representation is that each dictionary only contains one key-value pair, and a dictionary. The get method just walks through the dictionaries checking if it has the right value. The reason for this is because arrays are mutable: if you did a naive construction of an immutable type with a mutable field, the field is still mutable and thus while id["key1"]=2 wouldn't work, id.keys[1]=2 would. They go around this by not using a mutable type for holding the values (thus holding only single values) and then also holding an immutable dict. If you wanted to make this work directly on arrays, you could use something like ImmutableArrays.jl but I don't think that you'd get a performance advantage because you'd still have to loop through the array when checking for a key...
First off, I am new to Julia (I have been using/learning it since only two weeks). So do not put any confidence in what I am going to say unless it is validated by others.
The dictionary data structure Dict is defined here
julia/base/dict.jl
There is also a data structure called ImmutableDict in that file. However as const variables aren't actually const why would immutable dictionaries be immutable?
The comment states:
ImmutableDict is a Dictionary implemented as an immutable linked list,
which is optimal for small dictionaries that are constructed over many individual insertions
Note that it is not possible to remove a value, although it can be partially overridden and hidden
by inserting a new value with the same key
So let us call what you want to define as a dictionary UnmodifiableDict to avoid confusion. Such object would probably have
a similar data structure as Dict.
a constructor that takes a Dict as input to fill its data structure.
specialization (a new dispatch?) of the the method setindex! that is called by the operator [] =
in order to forbid modification of the data structure. This should be the case of all other functions that end with ! and hence modify the data.
As far as I understood, It is only possible to have subtypes of abstract types. Therefore you can't make UnmodifiableDict as a subtype of Dict and only redefine functions such as setindex!
Unfortunately this is a needed restriction for having run-time types and not compile-time types. You can't have such a good performance without a few restrictions.
Bottom line:
The only solution I see is to copy paste the code of the type Dict and its functions, replace Dict by UnmodifiableDict everywhere and modify the functions that end with ! to raise an exception if called.
you may also want to have a look at those threads.
https://groups.google.com/forum/#!topic/julia-users/n-lqjybIO_w
https://github.com/JuliaLang/julia/issues/1974
REVISION
Thanks to Chris Rackauckas for pointing out the error in my earlier response. I'll leave it below as an illustration of what doesn't work. But, Chris is right, the const declaration doesn't actually seem to improve performance when you feed the dictionary into the function. Thus, see Chris' answer for the best resolution to this issue:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test(D)
for jdx = 1:1000
# D[2] = 2
for idx = 0.0:5:3600
a = D[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test(D1); # 0.017789 seconds (4 allocations: 160 bytes)
#time test(D2); # 0.015075 seconds (4 allocations: 160 bytes)
Old Response
If you want your dictionary to be a constant, you can use:
const MyDict = getparameters( .. )
Update Keep in mind though that in base Julia, unlike some other languages, it's not that you cannot redefine constants, instead, it's just that you get a warning when doing so.
julia> const a = 2
2
julia> a = 3
WARNING: redefining constant a
3
julia> a
3
It is odd that you don't get the constant redefinition warning when adding a new key-val pair to the dictionary. But, you still see the performance boost from declaring it as a constant:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test1()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D1[idx]
end
end
end
function test2()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D2[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test1(); # 0.049204 seconds (1.44 M allocations: 22.003 MB, 5.64% gc time)
#time test2(); # 0.013657 seconds (4 allocations: 160 bytes)
To add to the existing answers, if you like immutability and would like to get performant (but still persistent) operations which change and extend the dictionary, check out FunctionalCollections.jl's PersistentHashMap type.
If you want to maximize performance and take maximal advantage of immutability, and you don't plan on doing any operations on the dictionary whatsoever, consider implementing a perfect hash function-based dictionary. In fact, if your dictionary is a compile-time constant, these can even be computed ahead of time (using metaprogramming) and precompiled.

Assert type information onto computed results in Julia

Problem
I read in an array of strings from a file.
julia> file = open("word-pairs.txt");
julia> lines = readlines(file);
But Julia doesn't know that they're strings.
julia> typeof(lines)
Array{Any,1}
Question
Can I tell Julia this somehow?
Is it possible to insert type information onto a computed result?
It would be helpful to know the context where this is an issue, because there might be a better way to express what you need - or there could be a subtle bug somewhere.
Can I tell Julia this somehow?
No, because the readlines function explicitly creates an Any array (a = {}): https://github.com/JuliaLang/julia/blob/master/base/io.jl#L230
Is it possible to insert type information onto a computed result?
You can convert the array:
r = convert(Array{ASCIIString,1}, w)
Or, create your own readstrings function based on the link above, but using ASCIIString[] for the collection array instead of {}.
Isaiah is right about the limits of readlines. More generally, often you can say
n = length(A)::Int
when generic type inference fails but you can guarantee the type in your particular case.
As of 0.3.4:
julia> typeof(lines)
Array{Union(ASCIIString,UTF8String),1}
I just wanted to warn against:
convert(Array{ASCIIString,1}, lines)
that can fail (for non-ASCII) while I guess, in this case nothing needs to be done, this should work:
convert(Array{UTF8String,1}, lines)

Resources