Erlang Hash Tree - networking

I'm working on a p2p app that uses hash trees.
I am writing the hash tree construction functions (publ/4 and publ_top/4) but I can't see how to fix publ_top/4.
I try to build a tree with publ/1:
nivd:publ("file.txt").
prints hashes...
** exception error: no match of right hand side value [67324168]
in function nivd:publ_top/4
in call from nivd:publ/1
The code in question is here:
http://github.com/AndreasBWagner/nivoa/blob/886c624c116c33cc821b15d371d1090d3658f961/nivd.erl
Where do you think the problem is?
Thank You,
Andreas

Looking at your code I can see one issue that would generate that particular exception error
publ_top(_,[],Accumulated,Level) ->
%% Go through the accumulated list of hashes from the prior level
publ_top(string:len(Accumulated),Accumulated,[],Level+1);
publ_top(FullLevelLen,RestofLevel,Accumulated,Level) ->
case FullLevelLen =:= 1 of
false -> [F,S|T]=RestofLevel,
io:format("~w---~w~n",[F,S]),
publ_top(FullLevelLen,T,lists:append(Accumulated,[erlang:phash2(string:concat([F],[S]))]),Level);
true -> done
end.
In the first function declaration you match against the empty list. In the second declaration you match against a list of length (at least) 2 ([F,S|T]). What happens when FullLevelLen is different from 1 and RestOfLevel is a list of length 1? (Hint: You'll get the above error).
The error would be easier to spot if you would pattern match on the function arguments, perhaps something like:
publ_top(_,[],Accumulated,Level) ->
%% Go through the accumulated list of hashes from the prior level
publ_top(string:len(Accumulated),Accumulated,[],Level+1);
publ_top(1, _, _, _) ->
done;
publ_top(_, [F,S|T], Accumulated, Level) ->
io:format("~w---~w~n",[F,S]),
publ_top(FullLevelLen,T,lists:append(Accumulated,[erlang:phash2(string:concat([F],[S]))]),Level);
%% Missing case:
% publ_top(_, [H], Accumulated, Level) ->
% ...

Related

Dialyzer does not catch errors on returned functions

Background
While playing around with dialyzer, typespecs and currying, I was able to create an example of a false positive in dialyzer.
For the purposes of this MWE, I am using diallyxir (versions included) because it makes my life easier. The author of dialyxir confirmed this was not a problem on their side, so that possibility is excluded for now.
Environment
$ elixir -v
Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Elixir 1.13.2 (compiled with Erlang/OTP 24)
Which version of Dialyxir are you using? (cat mix.lock | grep dialyxir):
"dialyxir": {:hex, :dialyxir, "1.1.0", "c5aab0d6e71e5522e77beff7ba9e08f8e02bad90dfbeffae60eaf0cb47e29488", [:mix], [{:erlex, ">= 0.2.6", [hex: :erlex, repo: "hexpm", optional: false]}], "hexpm", "07ea8e49c45f15264ebe6d5b93799d4dd56a44036cf42d0ad9c960bc266c0b9a"},
"erlex": {:hex, :erlex, "0.2.6", "c7987d15e899c7a2f34f5420d2a2ea0d659682c06ac607572df55a43753aa12e", [:mix], [], "hexpm", "2ed2e25711feb44d52b17d2780eabf998452f6efda104877a3881c2f8c0c0c75"},
Current behavior
Given the following code sample:
defmodule PracticingCurrying do
#spec greater_than(integer()) :: (integer() -> String.t())
def greater_than(min) do
fn number -> number > min end
end
end
Which clearly has a wrong typing, I get a success message:
$ mix dialyzer
Compiling 1 file (.ex)
Generated grokking_fp app
Finding suitable PLTs
Checking PLT...
[:compiler, :currying, :elixir, :gradient, :gradualizer, :kernel, :logger, :stdlib, :syntax_tools]
Looking up modules in dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Finding applications for dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Finding modules for dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Checking 518 modules in dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
Adding 44 modules to dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt
done in 0m24.18s
No :ignore_warnings opt specified in mix.exs and default does not exist.
Starting Dialyzer
[
check_plt: false,
init_plt: '/home/user/Workplace/fl4m3/grokking_fp/_build/dev/dialyxir_erlang-24.2.1_elixir-1.13.2_deps-dev.plt',
files: ['/home/user/Workplace/fl4m3/grokking_fp/_build/dev/lib/grokking_fp/ebin/Elixir.ImmutableValues.beam',
'/home/user/Workplace/fl4m3/grokking_fp/_build/dev/lib/grokking_fp/ebin/Elixir.PracticingCurrying.beam',
'/home/user/Workplace/fl4m3/grokking_fp/_build/dev/lib/grokking_fp/ebin/Elixir.TipCalculator.beam'],
warnings: [:unknown]
]
Total errors: 0, Skipped: 0, Unnecessary Skips: 0
done in 0m1.02s
done (passed successfully)
Expected behavior
I expected dialyzer to tell me the correct spec is #spec greater_than(integer()) :: (integer() -> bool()).
As a side note (and comparison, if you will) gradient does pick up the error.
I know that comparing these tools is like comparing oranges and apples, but I think it is still worth mentioning.
Questions
Is dialyzer not intended to catch this type of error?
If it should catch the error, what can possibly be failing? (is it my example that is incorrect, or something inside dialyzer?)
I personally find it hard to believe this could be a bug in Dialyzer, the tool has been used rather extensively by a lot of people for me to be the first to discover this error. However, I cannot explain what is happening.
Help is appreciated.
Dialyzer is pretty optimistic in its analysis and ignores some categories of errors.
This article provides some advanced explanations about its approach and limitations.
In the particular case of anonymous functions, dialyzer seems to perform a very minimal check
when they are being declared: it will ignore both the types of its arguments and return type, e.g.
the following doesn't lead any error even if is clearly wrong:
# no error
#spec add(integer()) :: (String.t() -> String.t())
def add(x) do
fn y -> x + y end
end
It will however point out a mismatch in arity, e.g.
# invalid_contract
# The #spec for the function does not match the success typing of the function.
#spec add2(integer()) :: (integer(), integer() -> integer())
def add2(x) do
fn y -> x + y end
end
Dialyzer might be able to detect a type conflict when trying to use the anonymous function,
but this isn't guaranteed (see article above), and the error message might not be helpful:
# Function main/0 has no local return.
def main do
positive? = greater_than(0)
positive?.(2)
end
We don't know what is the problem exactly, not even the line causing the error. But at least we know there is one and can debug it.
In the following example, the error is a bit more informative (using :lists.map/2 instead of Enum.map/2 because
dialyzer doesn't understand the enumerable protocol):
# Function main2/0 has no local return.
def main2 do
positive? = greater_than(0)
# The function call will not succeed.
# :lists.map(_positive? :: (integer() -> none()), [-2 | 0 | 1, ...])
# will never return since the success typing arguments are
# ((_ -> any()), [any()])
:lists.map(positive?, [1, 0, -2])
end
This tells us that dialyzer inferred the return type of greater_than/1 to be (integer() -> none()).
none is described in the article above as:
This is a special type that means that no term or type is valid.
Usually, when Dialyzer boils down the possible return values of a function to none(), it means the function should crash.
It is synonymous with "this stuff won't work."
So dialyzer knows that this function cannot be called successfully, but doesn't consider it to be a type clash until actually called, so it will allow the declaration (in the same way you can perfectly create a function that just raises).
Disclaimer: I couldn't find an official explanation regarding how dialyzer handles anonymous
functions in detail, so the explanations above are based on my observations and interpretation

SProxy in purescript?

What's the use of Sproxy in purescript?
In Pursuit, it's written as
data SProxy (sym :: Symbol)
--| A value-level proxy for a type-level symbol.
and what is meant by Symbol in purescipt?
First, please note that PureScript now has polykinds since version 0.14 and most functions now use Proxy instead of SProxy. Proxy is basically a generalisation of SProxy.
About Symbols and Strings
PureScript knows value level strings (known as String) and type level strings (known as Symbol).
A String can have any string value at runtime. The compiler does not track the value of the string.
A Symbol is different, it can only have one value (but remember, it is on the type level). The compiler keeps track of this string. This allows the compiler to type check certain expressions.
Symbols in Practice
The most prominent use of Symbols is in records. The difference between a Record and a String-Map is that the compiler knows about the keys at compile time and can typecheck lookups.
Now, sometimes we need to bridge the gap between these two worlds: The type level and the value level world. Maybe you know that PureScript records are implemented as JavaScript objects in the official compiler. This means we need to somehow receive a string value from our symbol. The magical function reflectSymbol allows us to turn a symbol into a string. But a symbol is on the type level. This means we can only write a symbol where we can write types (so for example in type definition after ::). This is where the Proxy hack comes in. The SProxy is a simple value that "stores" the type by applying it.
For example the get function from purescript-records allows us to get a value at a property from a record.
get :: forall proxy r r' l a. IsSymbol l => Cons l a r' r => proxy l -> Record r -> a
If we apply the first paramerter we get:
get (Proxy :: Proxy "x") :: forall r a. { x :: a | r } -> a
Now you could argue that you can get the same function by simply writing:
_.x :: forall r a. { x :: a | r } -> a
It has exactly the same type. This leads to one last question:
But why?
Well, there are certain meta programming szenarios, where you don't programm for a specific symbol, but rather for any symbol. Imagine you want to write a JSON serialiser for any record. You might want to "iterate" over every property of the record, get the value, turn the value itself into JSON and then concatinate the key value pair with all the other keys and values.
An example for such an implementation can be found here
This is maybe not the most technical explanation of it all, but this is how I understand it.

How can I get this function to be tail-recursive?

I'm still trying to implement 2-3 finger trees and I made good progress (repository). While doing some benchmarks I found out that my quite basic toList results in a StackOverflowException when the tree ist quite large. At first I saw an easy fix and made it tail-recursive.
Unfortunately, it turned out that toList wasn't the culprit but viewr was:
/// Return both the right-most element and the remaining tree (lazily).
let rec viewr<'a> : FingerTree<'a> -> View<'a> = function
| Empty -> Nil
| Single x -> View(x, lazyval Empty)
| Deep(prefix, deeper, One x) ->
let rest = lazy (
match viewr deeper.Value with
| Nil ->
prefix |> Digit.promote
| View (node, lazyRest) ->
let suffix = node |> Node.toList |> Digit.ofList
Deep(prefix, lazyRest, suffix)
)
View(x, rest)
| Deep(prefix, deeper, Digit.SplitLast(shorter, x)) ->
View(x, lazy Deep(prefix, deeper, shorter))
| _ -> failwith Messages.patternMatchImpossible
Looking for the only recursive call it is obvious that this is is not tail-recursive. Somehow I hoped this problem wouldn't exist because that call is wrapped in a Lazy which IMHO is similar to a continuation.
I heard and read of continuations but so far never (had to) use(d) them. I guess here I really need to. I've been staring at the code for quite some time, putting function parameters here and there, calling them other places… I'm totally lost!
How can this be done?
Update: The calling code looks like this:
/// Convert a tree to a list (left to right).
let toList tree =
let rec toList acc tree =
match viewr tree with
| Nil -> acc
| View(head, Lazy tail) -> tail |> toList (head::acc)
toList [] tree
Update 2: The code that caused the crash is this one.
let tree = seq {1..200000} |> ConcatDeque.ofSeq
let back = tree |> ConcatDeque.toList
The tree get built fine, I checked and it is only 12 levels deep. It's the call in line 2 that triggered the overflow.
Update 3: kvb was right, that pipe issue I ran into before has something to do with this. Re-testing the cross product of debug/release and with/without pipe it worked in all but one case: debug mode with the pipe operator crashed. The behavior was the same for 32 vs. 64 bit.
I'm quite sure that I was running release mode when posting the question but today it's working. Maybe there was some other factor… Sorry about that.
Although the crash is solved, I'm leaving the question open out of theoretical interest. After all, we're here to learn, aren't we?
So let me adapt the question:
From looking at the code, viewr is definitely not tail-recursive. Why doesn't it always blow up and how would one rewrite it using continuations?
Calling viewr never results in an immediate recursive call to viewr (the recursive call is protected by lazy and is not forced within the remainder of the call to viewr), so there's no need to make it tail recursive to prevent the stack from growing without bound. That is, a call to viewr creates a new stack frame which is then immediately popped when viewr's work is done; the caller can then force the lazy value resulting in a new stack frame for the nested viewr call, which is then immediately popped again, etc., so repeating this process doesn't result in a stack overflow.

PostScript forall on dictionaries

According to the PLRM it doesn't matter in which order you execute a forall on a dict:
(p. 597) forall pushes a key and a value on the operand stack and executes proc for each key-value pair in the dictionary
...
(p. 597) The order in which forall enumerates the entries in the dictionary is arbitrary. New entries put in the dictionary during the execution of proc may or may not be included in the enumeration. Existing entries removed from the dictionary by proc will not be encountered later in the enumeration.
Now I was executing some code:
/d 5 dict def
d /abc 123 put
d { } forall
My output (operand stack) is:
--------top-
/abc
123
-----bottom-
The output of ghostscript and PLRM (operand stack) is:
--------top-
123
/abc
-----bottom-
Does it really not matter in what order you process the key-value pairs of the dict?
on the stack, do you first need to push the value and then the key, or do you need to push the key first? (as the PLRM only talks about "a key and a value", but doesnt tell you anything about the order).
Thanks in advance
It would probably help if you quoted the page number qhen you quote sections from the PLRM, its hard to see where you are getting this from.
When executing forall the order in which forall enumerates the dictionary pairs is arbitrary, you have no influence over it. However forall always pushes the key and then the value. Even if this is implied in the text you (didn't quite) quote, you can see from the example in the forall operator that this is hte case.
when you say 'my output' do you mean you are writing your own PostScript interpreter ? If so then your output is incorrect, when pushing a key/value pair the key is pushed first.

Tail Recursions in erlang

I'm learning Erlang from the very basic and have a problem with a tail recursive function. I want my function to receive a list and return a new list where element = element + 1. For example, if I send [1,2,3,4,5] as an argument, it must return [2,3,4,5,6]. The problem is that when I send that exact arguments, it returns [[[[[[]|2]|3]|4]|5]|6].
My code is this:
-module(test).
-export([test/0]).
test()->
List = [1,2,3,4,5],
sum_list_2(List).
sum_list_2(List)->
sum_list_2(List,[]).
sum_list_2([Head|Tail], Result)->
sum_list_2(Tail,[Result|Head +1]);
sum_list_2([], Result)->
Result.
However, if I change my function to this:
sum_list_2([Head|Tail], Result)->
sum_list_2(Tail,[Head +1|Result]);
sum_list_2([], Result)->
Result.
It outputs [6,5,4,3,2] which is OK. Why the function doesn't work the other way around([Result|Head+1] outputing [2,3,4,5,6])?
PS: I know this particular problem is solved with list comprehensions, but I want to do it with recursion.
For this kind of manipulation you should use list comprehension:
1> L = [1,2,3,4,5,6].
[1,2,3,4,5,6]
2> [X+1 || X <- L].
[2,3,4,5,6,7]
it is the fastest and most idiomatic way to do it.
A remark on your fist version: [Result|Head +1] builds an improper list. the construction is always [Head|Tail] where Tail is a list. You could use Result ++ [Head+1] but this would perform a copy of the Result list at each recursive call.
You can also look at the code of lists:map/2 which is not tail recursive, but it seems that actual optimization of the compiler work well in this case:
inc([H|T]) -> [H+1|inc(T)];
inc([]) -> [].
[edit]
The internal and hidden representation of a list looks like a chained list. Each element contains a term and a reference to the tail. So adding an element on top of the head does not need to modify the existing list, but adding something at the end needs to mutate the last element (the reference to the empty list is replaced by a reference to the new sublist). As variables are not mutable, it needs to make a modified copy of the last element which in turn needs to mutate the previous element of the list and so on. As far as I know, the optimizations of the compiler do not make the decision to mutate variable (deduction from the the documentation).
The function that produces the result in reverse order is a natural consequence of you adding the newly incremented element to the front of the Result list. This isn't uncommon, and the recommended "fix" is to simply list:reverse/1 the output before returning it.
Whilst in this case you could simply use the ++ operator instead of the [H|T] "cons" operator to join your results the other way around, giving you the desired output in the correct order:
sum_list_2([Head|Tail], Result)->
sum_list_2(Tail, Result ++ [Head + 1]);
doing so isn't recommended because the ++ operator always copies it's (increasingly large) left hand operand, causing the algorithm to operate in O(n^2) time instead of the [Head + 1 | Tail] version's O(n) time.

Resources