Inspecting memoization in Julia - julia

There is the package Memoize.jl, with which one can memoize in Julia. Its #memoize macro creates a dictionary. Is there a way to inspect this dictionary?
As an example, after I execute
#memoize f(n) = n ≤ 1 ? n : f(n-1) + f(n-1)
#show f(10)
I would like to inspect the intermediate values of f which have been generated, so f(0), f(1), ..., f(10).
Someone had written Memo.jl as a replacement or enhancement of Memoize.jl allowing for customization and inspection. However, this package seems to be not maintained.

One of the nice things about Julia is that packages like this are just more Julia code instead of doing something in C that's impossible to poke at from the main language. And the code for Memoize is pretty simple. Take a look at this line:
fcachename = Symbol("##", f, "_memoized_cache")
That tells you the name of the cache dictionary, given f as the name of the function that's being memoized. So let's try accessing the variable with that name after running your example code:
julia> var"##f_memoized_cache"
IdDict{Any,Any} with 10 entries:
(7,) => 64
(6,) => 32
(4,) => 8
(5,) => 16
(9,) => 256
(10,) => 512
(2,) => 2
(8,) => 128
(1,) => 1
(3,) => 4
Voila! There's the actual cache. It's just an IdDict with a weird name in same module as the method definition. The var"..." syntax is a recently-added custom string literal syntax for an identifier with a "strange" name—it's a shorthand for doing eval(Symbol("##f_memoized_cache")).
Of course, since this isn't an official documented part of the Memoize API, you can't rely on that not changing, but it works currently. I'm not sure if you want a more official API, but if you do you could open an issue asking for that as a new feature.

Related

Dialyzer does not catch errors on returned functions

Background
While playing around with dialyzer, typespecs and currying, I was able to create an example of a false positive in dialyzer.
For the purposes of this MWE, I am using diallyxir (versions included) because it makes my life easier. The author of dialyxir confirmed this was not a problem on their side, so that possibility is excluded for now.
Environment
$ elixir -v
Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Elixir 1.13.2 (compiled with Erlang/OTP 24)
Which version of Dialyxir are you using? (cat mix.lock | grep dialyxir):
"dialyxir": {:hex, :dialyxir, "1.1.0", "c5aab0d6e71e5522e77beff7ba9e08f8e02bad90dfbeffae60eaf0cb47e29488", [:mix], [{:erlex, ">= 0.2.6", [hex: :erlex, repo: "hexpm", optional: false]}], "hexpm", "07ea8e49c45f15264ebe6d5b93799d4dd56a44036cf42d0ad9c960bc266c0b9a"},
"erlex": {:hex, :erlex, "0.2.6", "c7987d15e899c7a2f34f5420d2a2ea0d659682c06ac607572df55a43753aa12e", [:mix], [], "hexpm", "2ed2e25711feb44d52b17d2780eabf998452f6efda104877a3881c2f8c0c0c75"},
Current behavior
Given the following code sample:
defmodule PracticingCurrying do
#spec greater_than(integer()) :: (integer() -> String.t())
def greater_than(min) do
fn number -> number > min end
end
end
Which clearly has a wrong typing, I get a success message:
$ mix dialyzer
Compiling 1 file (.ex)
Generated grokking_fp app
Finding suitable PLTs
Checking PLT...
[:compiler, :currying, :elixir, :gradient, :gradualizer, :kernel, :logger, :stdlib, :syntax_tools]
Looking up modules in dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Finding applications for dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Finding modules for dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Checking 518 modules in dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Adding 44 modules to dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
done in 0m24.18s
No :ignore_warnings opt specified in mix.exs and default does not exist.
Starting Dialyzer
[
check_plt: false,
init_plt: '/home/user/Workplace/fl4m3/grokking_fp/_build/dev/dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt',
files: ['/home/user/Workplace/fl4m3/grokking_fp/_build/dev/lib/grokking_fp/ebin/Elixir.ImmutableValues.beam',
'/home/user/Workplace/fl4m3/grokking_fp/_build/dev/lib/grokking_fp/ebin/Elixir.PracticingCurrying.beam',
'/home/user/Workplace/fl4m3/grokking_fp/_build/dev/lib/grokking_fp/ebin/Elixir.TipCalculator.beam'],
warnings: [:unknown]
]
Total errors: 0, Skipped: 0, Unnecessary Skips: 0
done in 0m1.02s
done (passed successfully)
Expected behavior
I expected dialyzer to tell me the correct spec is #spec greater_than(integer()) :: (integer() -> bool()).
As a side note (and comparison, if you will) gradient does pick up the error.
I know that comparing these tools is like comparing oranges and apples, but I think it is still worth mentioning.
Questions
Is dialyzer not intended to catch this type of error?
If it should catch the error, what can possibly be failing? (is it my example that is incorrect, or something inside dialyzer?)
I personally find it hard to believe this could be a bug in Dialyzer, the tool has been used rather extensively by a lot of people for me to be the first to discover this error. However, I cannot explain what is happening.
Help is appreciated.
Dialyzer is pretty optimistic in its analysis and ignores some categories of errors.
This article provides some advanced explanations about its approach and limitations.
In the particular case of anonymous functions, dialyzer seems to perform a very minimal check
when they are being declared: it will ignore both the types of its arguments and return type, e.g.
the following doesn't lead any error even if is clearly wrong:
# no error
#spec add(integer()) :: (String.t() -> String.t())
def add(x) do
fn y -> x + y end
end
It will however point out a mismatch in arity, e.g.
# invalid_contract
# The #spec for the function does not match the success typing of the function.
#spec add2(integer()) :: (integer(), integer() -> integer())
def add2(x) do
fn y -> x + y end
end
Dialyzer might be able to detect a type conflict when trying to use the anonymous function,
but this isn't guaranteed (see article above), and the error message might not be helpful:
# Function main/0 has no local return.
def main do
positive? = greater_than(0)
positive?.(2)
end
We don't know what is the problem exactly, not even the line causing the error. But at least we know there is one and can debug it.
In the following example, the error is a bit more informative (using :lists.map/2 instead of Enum.map/2 because
dialyzer doesn't understand the enumerable protocol):
# Function main2/0 has no local return.
def main2 do
positive? = greater_than(0)
# The function call will not succeed.
# :lists.map(_positive? :: (integer() -> none()), [-2 | 0 | 1, ...])
# will never return since the success typing arguments are
# ((_ -> any()), [any()])
:lists.map(positive?, [1, 0, -2])
end
This tells us that dialyzer inferred the return type of greater_than/1 to be (integer() -> none()).
none is described in the article above as:
This is a special type that means that no term or type is valid.
Usually, when Dialyzer boils down the possible return values of a function to none(), it means the function should crash.
It is synonymous with "this stuff won't work."
So dialyzer knows that this function cannot be called successfully, but doesn't consider it to be a type clash until actually called, so it will allow the declaration (in the same way you can perfectly create a function that just raises).
Disclaimer: I couldn't find an official explanation regarding how dialyzer handles anonymous
functions in detail, so the explanations above are based on my observations and interpretation

What does the "Base" keyword mean in Julia?

I saw this example in the Julia language documentation. It uses something called Base. What is this Base?
immutable Squares
count::Int
end
Base.start(::Squares) = 1
Base.next(S::Squares, state) = (state*state, state+1)
Base.done(S::Squares, s) = s > S.count;
Base.eltype(::Type{Squares}) = Int # Note that this is defined for the type
Base.length(S::Squares) = S.count;
Base is a module which defines many of the functions, types and macros used in the Julia language. You can view the files for everything it contains here or call whos(Base) to print a list.
In fact, these functions and types (which include things like sum and Int) are so fundamental to the language that they are included in Julia's top-level scope by default.
This means that we can just use sum instead of Base.sum every time we want to use that particular function. Both names refer to the same thing:
Julia> sum === Base.sum
true
Julia> #which sum # show where the name is defined
Base
So why, you might ask, is it necessary is write things like Base.start instead of simply start?
The point is that start is just a name. We are free to rebind names in the top-level scope to anything we like. For instance start = 0 will rebind the name 'start' to the integer 0 (so that it no longer refers to Base.start).
Concentrating now on the specific example in docs, if we simply wrote start(::Squares) = 1, then we find that we have created a new function with 1 method:
Julia> start
start (generic function with 1 method)
But Julia's iterator interface (invoked using the for loop) requires us to add the new method to Base.start! We haven't done this and so we get an error if we try to iterate:
julia> for i in Squares(7)
println(i)
end
ERROR: MethodError: no method matching start(::Squares)
By updating the Base.start function instead by writing Base.start(::Squares) = 1, the iterator interface can use the method for the Squares type and iteration will work as we expect (as long as Base.done and Base.next are also extended for this type).
I'll grant that for something so fundamental, the explanation is buried a bit far down in the documentation, but http://docs.julialang.org/en/release-0.4/manual/modules/#standard-modules describes this:
There are three important standard modules: Main, Core, and Base.
Base is the standard library (the contents of base/). All modules
implicitly contain using Base, since this is needed in the vast
majority of cases.

Immutable dictionary

Is there a way to enforce a dictionary being constant?
I have a function which reads out a file for parameters (and ignores comments) and stores it in a dict:
function getparameters(filename::AbstractString)
f = open(filename,"r")
dict = Dict{AbstractString, AbstractString}()
for ln in eachline(f)
m = match(r"^\s*(?P<key>\w+)\s+(?P<value>[\w+-.]+)", ln)
if m != nothing
dict[m[:key]] = m[:value]
end
end
close(f)
return dict
end
This works just fine. Since i have a lot of parameters, which i will end up using on different places, my idea was to let this dict be global. And as we all know, global variables are not that great, so i wanted to ensure that the dict and its members are immutable.
Is this a good approach? How do i do it? Do i have to do it?
Bonus answerable stuff :)
Is my code even ok? (it is the first thing i did with julia, and coming from c/c++ and python i have the tendencies to do things differently.) Do i need to check whether the file is actually open? Is my reading of the file "julia"-like? I could also readall and then use eachmatch. I don't see the "right way to do it" (like in python).
Why not use an ImmutableDict? It's defined in base but not exported. You use one as follows:
julia> id = Base.ImmutableDict("key1"=>1)
Base.ImmutableDict{String,Int64} with 1 entry:
"key1" => 1
julia> id["key1"]
1
julia> id["key1"] = 2
ERROR: MethodError: no method matching setindex!(::Base.ImmutableDict{String,Int64}, ::Int64, ::String)
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> id2 = Base.ImmutableDict(id,"key2"=>2)
Base.ImmutableDict{String,Int64} with 2 entries:
"key2" => 2
"key1" => 1
julia> id.value
1
You may want to define a constructor which takes in an array of pairs (or keys and values) and uses that algorithm to define the whole dict (that's the only way to do so, see the note at the bottom).
Just an added note, the actual internal representation is that each dictionary only contains one key-value pair, and a dictionary. The get method just walks through the dictionaries checking if it has the right value. The reason for this is because arrays are mutable: if you did a naive construction of an immutable type with a mutable field, the field is still mutable and thus while id["key1"]=2 wouldn't work, id.keys[1]=2 would. They go around this by not using a mutable type for holding the values (thus holding only single values) and then also holding an immutable dict. If you wanted to make this work directly on arrays, you could use something like ImmutableArrays.jl but I don't think that you'd get a performance advantage because you'd still have to loop through the array when checking for a key...
First off, I am new to Julia (I have been using/learning it since only two weeks). So do not put any confidence in what I am going to say unless it is validated by others.
The dictionary data structure Dict is defined here
julia/base/dict.jl
There is also a data structure called ImmutableDict in that file. However as const variables aren't actually const why would immutable dictionaries be immutable?
The comment states:
ImmutableDict is a Dictionary implemented as an immutable linked list,
which is optimal for small dictionaries that are constructed over many individual insertions
Note that it is not possible to remove a value, although it can be partially overridden and hidden
by inserting a new value with the same key
So let us call what you want to define as a dictionary UnmodifiableDict to avoid confusion. Such object would probably have
a similar data structure as Dict.
a constructor that takes a Dict as input to fill its data structure.
specialization (a new dispatch?) of the the method setindex! that is called by the operator [] =
in order to forbid modification of the data structure. This should be the case of all other functions that end with ! and hence modify the data.
As far as I understood, It is only possible to have subtypes of abstract types. Therefore you can't make UnmodifiableDict as a subtype of Dict and only redefine functions such as setindex!
Unfortunately this is a needed restriction for having run-time types and not compile-time types. You can't have such a good performance without a few restrictions.
Bottom line:
The only solution I see is to copy paste the code of the type Dict and its functions, replace Dict by UnmodifiableDict everywhere and modify the functions that end with ! to raise an exception if called.
you may also want to have a look at those threads.
https://groups.google.com/forum/#!topic/julia-users/n-lqjybIO_w
https://github.com/JuliaLang/julia/issues/1974
REVISION
Thanks to Chris Rackauckas for pointing out the error in my earlier response. I'll leave it below as an illustration of what doesn't work. But, Chris is right, the const declaration doesn't actually seem to improve performance when you feed the dictionary into the function. Thus, see Chris' answer for the best resolution to this issue:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test(D)
for jdx = 1:1000
# D[2] = 2
for idx = 0.0:5:3600
a = D[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test(D1); # 0.017789 seconds (4 allocations: 160 bytes)
#time test(D2); # 0.015075 seconds (4 allocations: 160 bytes)
Old Response
If you want your dictionary to be a constant, you can use:
const MyDict = getparameters( .. )
Update Keep in mind though that in base Julia, unlike some other languages, it's not that you cannot redefine constants, instead, it's just that you get a warning when doing so.
julia> const a = 2
2
julia> a = 3
WARNING: redefining constant a
3
julia> a
3
It is odd that you don't get the constant redefinition warning when adding a new key-val pair to the dictionary. But, you still see the performance boost from declaring it as a constant:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test1()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D1[idx]
end
end
end
function test2()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D2[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test1(); # 0.049204 seconds (1.44 M allocations: 22.003 MB, 5.64% gc time)
#time test2(); # 0.013657 seconds (4 allocations: 160 bytes)
To add to the existing answers, if you like immutability and would like to get performant (but still persistent) operations which change and extend the dictionary, check out FunctionalCollections.jl's PersistentHashMap type.
If you want to maximize performance and take maximal advantage of immutability, and you don't plan on doing any operations on the dictionary whatsoever, consider implementing a perfect hash function-based dictionary. In fact, if your dictionary is a compile-time constant, these can even be computed ahead of time (using metaprogramming) and precompiled.

Using type classes to overload notation for constructors (now a namespace issue)

This is a derivative question of Existing constants (e.g. constructors) in type class instantiations.
The short question is this: How can I prevent the error that occurs due to free_constructors, so that I can combine the two theories that I include below.
I've been sitting on this for months. The other question helped me move forward (it appears). Thanks to the person who deserves thanks.
The real issue here is about overloading notation, though it looks like I now just have a namespace problem.
At this point, it's not a necessity, just an inconvenience that two theories have to be used. If the system allows, all this will disappear, but I ask anyway to make it possible to get a little extra information.
The big explanation here comes in explaining the motivation, which may lead to getting some extra information. I explain some, then include S1.thy, make a few comments, and then include S2.thy.
Motivation: using syntactic type classes for overloading notation of multiple binary datatypes
The basic idea is that I might have 5 different forms of binary words that have been defined with datatype, and I want to define some binary and hexadecimal notation that's overloaded for all 5 types.
I don't know what all is possible, but the past tells me (by others telling me things) that if I want code generation, then I should use type classes, to get the magic that comes with type classes.
The first theory, S1
Next is the theory S1.thy. What I do is instantiate bool for the type classes zero and one, and then use free_constructors to set up the notation 0 and 1 for use as the bool constructors True and False. It seems to work. This in itself is something I specifically wanted, but didn't know how to do.
I then try to do the same thing with an example datatype, BitA. It doesn't work because constant case_BitA is created when BitA is defined with datatype. It causes a conflict.
Further comments of mine are in the THY.
theory S1
imports Complex_Main
begin
declare[[show_sorts]]
(*---EXAMPLE, NAT 0: IT CAN BE USED AS A CONSTRUCTOR.--------------------*)
fun foo_nat :: "nat => nat" where
"foo_nat 0 = 0"
(*---SETTING UP BOOL TRUE & FALSE AS 0 AND 1.----------------------------*)
(*
I guess it works, because 'free_constructors' was used for 'bool' in
Product_Type.thy, instead of in this theory, like I try to do with 'BitA'.
*)
instantiation bool :: "{zero,one}"
begin
definition "zero_bool = False"
definition "one_bool = True"
instance ..
end
(*Non-constructor pattern error at this point.*)
fun foo1_bool :: "bool => bool" where
"foo1_bool 0 = False"
find_consts name: "case_bool"
free_constructors case_bool for "0::bool" | "1::bool"
by(auto simp add: zero_bool_def one_bool_def)
find_consts name: "case_bool"
(*found 2 constant(s):
Product_Type.bool.case_bool :: "'a∷type => 'a∷type => bool => 'a∷type"
S1.bool.case_bool :: "'a∷type => 'a∷type => bool => 'a∷type" *)
fun foo2_bool :: "bool => bool" where
"foo2_bool 0 = False"
|"foo2_bool 1 = True"
thm foo2_bool.simps
(*---TRYING TO WORK A DATATYPE LIKE I DID WITH BOOL.---------------------*)
(*
There will be 'S1.BitA.case_BitA', so I can't do it here.
*)
datatype BitA = A0 | A1
instantiation BitA :: "{zero,one}"
begin
definition "0 = A0"
definition "1 = A1"
instance ..
end
find_consts name: "case_BitA"
(*---ERROR NEXT: because there's already S1.BitA.case_BitA.---*)
free_constructors case_BitA for "0::BitA" | "1::BitA"
(*ERROR: Duplicate constant declaration "S1.BitA.case_BitA" vs.
"S1.BitA.case_BitA" *)
end
The second theory, S2
It seems that case_BitA is necessary for free_constructors to set things up, and it occurred to me that maybe I could get it to work by using datatype in one theory, and use free_constructors in another theory.
It seems to work. Is there a way I can combine these two theories?
theory S2
imports S1
begin
(*---HERE'S THE WORKAROUND. IT WORKS BECAUSE BitA IS IN S1.THY.----------*)
(*
I end up with 'S1.BitA.case_BitA' and 'S2.BitA.case_BitA'.
*)
declare[[show_sorts]]
find_consts name: "BitA"
free_constructors case_BitA for "0::BitA" | "1::BitA"
unfolding zero_BitA_def one_BitA_def
using BitA.exhaust
by(auto)
find_consts name: "BitA"
fun foo_BitA :: "BitA => BitA" where
"foo_BitA 0 = A0"
|"foo_BitA 1 = A1"
thm foo_BitA.simps
end
The command free_constructors always creates a new constant of the given name for the case expression and names the generated theorems in the same way as datatype does, because datatype internaly calls free_constructors.
Thus, you have to issue the command free_constructors in a context that changes the name space. For example, use a locale:
locale BitA_locale begin
free_constructors case_BitA for "0::BitA" | "1::BitA" ...
end
interpretation BitA!: BitA_locale .
After that, you can use both A0 and A1 as constructors in pattern matching equations and 0 and 1, but you should not mix them in a single equation. Yet, A0 and 0 are still different constants to Isabelle. This means that you may have to manually convert the one into the other during proofs and code generation works only for one of them. You would have to set up the code generator to replace A0 with 0 and A1 with 1 (or vice versa) in the code equations. To that end, you want to declare the equations A0 = 0 and A1 = 1 as [code_unfold], but you also probably want to write your own preprocessor function in ML that replaces A0 and A1 in left-hand sides of code equations, see the code generator tutorial for details.
Note that if BitA was a polymorphic datatype, packages such as BNF and lifting would continue to use the old set of constructors.
Given these problems, I would really go for the manual definition of the type as described in my answer to another question. This saves you a lot of potential issues later on. Also, if you are really only interested in notation, you might want to consider adhoc_overloading. It works perfectly well with code generation and is more flexible than type classes. However, you cannot talk about the overloaded notation abstractly, i.e., every occurrence of the overloaded constant must be disambiguated to a single use case. In terms of proving, this should not be a restriction, as you assume nothing about the overloaded constant. In terms of definitions over the abstract notation, you would have to repeat the overloading there as well (or abstract over the overloaded definitions in a locale and interpret the locale several times).

Abstracting over Collection Types

Is there a possibility of writing functions which are generic in respect to collection types they support other than using the seq module?
The goal is, not having to resort to copy and paste when adding new collection functions.
Generic programming with collections can be handled the same way generic programming is done in general: Using generics.
let f (map_fun : ('T1 -> 'T2) -> 'T1s -> 'T2s) (iter_fun : ('T2 -> unit) -> 'T2s -> unit) (ts : 'T1s) (g : 'T1 -> 'T2) (h : 'T2 -> unit)=
ts
|> map_fun g
|> iter_fun h
type A =
static member F(ts, g, h) = f (Array.map) (Array.iter) ts g h
static member F(ts, g, h) = f (List.map) (List.iter) ts g h
A bit ugly and verbose, but it's possible. I'm using a class and static members to take advantage of overloading. In your code, you can just use A.F and the correct specialization will be called.
For a prettier solution, see https://stackoverflow.com/questions/979084/what-features-would-you-add-remove-or-change-in-f/987569#987569 Although this feature is enabled only for the core library, it should not be a problem to modify the compiler to allow it in your code. That's possible because the source code of the compiler is open.
The seq<'T> type is the primary way of writing computations that work for any collections in F#. There are a few ways you can work with the type:
You can use functions from the Seq module (such as Seq.filter, Seq.windowed etc.)
You can use sequence comprehensions (e.g. seq { for x in col -> x * 2 })
You can use the underlying (imperative) IEnumerator<'T> type, which is sometimes needed e.g. if you want to implement your own zipping of collections (this is returned by calling GetEnumerator)
This is relatively simple type and it can be used only for reading data from collections. As the result, you'll always get a value of type seq<'T> which is essentially a lazy sequence.
F# doesn't have any mechanism for transforming collections (e.g. generic function taking collection C to collection C with new values) or any mechanism for creating collections (which is available in Haskell or Scala).
In most of the practical cases, I don't find that a problem - most of the work can be done using seq<'T> and when you need a specialized collection (e.g. array for performance), you typically need a slightly different implementation anyway.

Resources