Dict with array key in julia - dictionary

In julia language (ver 1.1.0), I am experimenting what would happen when I mutate a dictionary key.
Before mutation, both the variable x and [1,2,3] is recognized.
x = [1,2,3]; d = Dict(x=>"x")
haskey(d, x)
# true
haskey(d, [1,2,3])
# true
Once I mutate x, neither the variable x nor [1,2,3,4] is recognized.
push!(x, 4)
haskey(d, x)
# false
haskey(d, [1,2,3,4])
# false
haskey(d, [1,2,3])
# false
Value-wise, the key is "equal" to x, so I guess this has something to do with the hash function, but could not understand the source code.
collect(keys(d))[1] == x == [1,2,3,4]
# true
Can someone explain what makes this behavior, or suggest resources that I should look at?

The key function to look into is ht_keyindex.
There you can see that in order for the key to be found it must both:
Match hash value (via hashindex).
Match identity or value.
There is a non-negligible probability that after mutating x it will have the same hashindex value and the key would be found. For example here you could set index 4 of x to 5 and all seemingly would work:
julia> x[4] = 5
5
julia> x
4-element Array{Int64,1}:
1
2
3
5
julia> haskey(d, x)
true
Therefore - as in any programming language supporting dictionaries in a similar way - mutating keys of the dictionary should not be done. The above discussion should be in practice only a theoretical one.

Related

How to count minimum on multiple variables in Julia

I want to plot correct y-axis limits. So, require to count the maximum y and minimum y.
y1=[2 3 4]
y2=[7 5 6]
...
m = minimum(y1)
m = minimum(m, minimum(y2))
error message
ERROR: MethodError: objects of type Int64 are not callable
Maybe you forgot to use an operator such as *, ^, %, / etc. ?
Stacktrace:
[1] mapreduce_first(f::Int64, op::Function, x::Int64)
# Base ./reduce.jl:419
[2] mapreduce(f::Int64, op::Function, a::Int64)
# Base ./reduce.jl:446
[3] minimum(f::Int64, a::Int64; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# Base ./reduce.jl:725
[4] minimum(f::Int64, a::Int64)
# Base ./reduce.jl:725
[5] top-level scope
# REPL[107]:1
Previous code is just a simplified code, in my case, it require parse and get from a loop the pseudo code like:
x_data, y_data, names, y_min, y_max = [], [], [], 100, 0
for filename in *.csv
df = parse_csv(filename) # df is a dataframe
push!(names, filename)
d = df.value
y_min = minimum(y_min, d)
....
end
# plot all file by the y_min, y_max
i=1
for d in y_data;
lineplot(x_data, d, ylims=(y_min,y_max), name=names[i])
i += 1
end
Here are two solutions:
First. This is simple, just take the minimums of the minimums, etc.
julia> min(minimum(y1), minimum(y2))
2
julia> max(maximum(y1), maximum(y2))
7
Second solution. This iterates over each pair of values from y1 and y2, takes the minimum/maximum of each pair, and then finds the minimum of those again.
julia> minimum(minimum, zip(y1, y2))
2
julia> maximum(maximum, zip(y1, y2))
7
Here's a third one:
julia> min(y1..., y2...)
2
julia> max(y1..., y2...)
7
Elegant, but splatting of vectors is often inefficient in terms of performance.
The problem is that you don't know the difference between the min function and the minimum function (or you're unaware of the min function):
minimum(itr; [init])
Returns the smallest element in a collection.
So it gets a collection (E.g., Array) and returns the minimum of it.
min(x, y, ...)
Return the minimum of the arguments.
This one gets indefinite arguments and returns the minimum of them! It can't apply min on the x if the x is a container by itself!
julia> min(2, 3)
2
julia> min([2, 3])
ERROR: MethodError: no method matching min(::Vector{Int64})
On the other hand, for the minimum function:
julia> minimum(2, 3)
ERROR: MethodError: objects of type Int64 are not callable
julia> minimum([2, 3])
2
So I wanted to explain these to you to understand your code's meaning better.
We have this minimum(m, minimum(y2)) expression in your code block. This is literally the same as minimum(2, 5). So you're not passing containers to the function, leading to an error! For this, you should choose min instead:
julia> m = min(m, minimum(y2))
2
Or we can wrap m and minimum(y2) in a container and use the minimum function to achieve the overall min:
julia> m = minimum([m, minimum(y2)])
2
If you follow the explanation, you can absolutely understand the following:
julia> min(m, minimum(y2)) == minimum([m, minimum(y2)]) == min(m, min(y2...))
true

Substitute of "occursin" function to find a string in an Array{String,1}

What I am trying to do is
i = occursin("ENTITIES\n", lines)
i != 0 || error("ENTITIES section not found")
The error information is
ERROR: LoadError: LoadError: MethodError: no method matching occursin(::String, ::Array{String,1})
Closest candidates are:
occursin(::Union{AbstractChar, AbstractString}, ::AbstractString) at strings/search.jl:452
This is a piece of julia v0.6 code. I am using v1.1 now. I am new to julia and don't know what's the proper subsititute function for this. Please help.
You can broadcast orrursin like this (add a . after function name):
julia> x = "abc"
"abc"
julia> y = ["abc", "xyz"]
2-element Array{String,1}:
"abc"
"xyz"
julia> b = occursin.(x, y)
2-element BitArray{1}:
true
false
julia> findall(b)
1-element Array{Int64,1}:
1
julia> findfirst(b)
1
Note that although String can be iterated over it is treated by broadcast as a scalar.
Also it is worth to remember that occursin returns Bool value so that you can use it directly in logical tests e.g. i || error("ENTITIES section not found") in the code from your question.
In order to locate the index in the collection of the occurrence of true in the return value of broadcasted occursin use findall or findfirst functions (there is also findlast). The difference is that findall returns a vector of entries where true is encountered in the collection, while findfirst returns the first such entry only. Also note the difference when you pass all falses to it. findall will return an empty vector and findfirst will return nothing.
If you do not want to retain the vector b in the code above, you can get the indices directly (this should be faster) by passing a predicate as a first argument to findall/findfirst:
julia> findall(t -> occursin(x, t), y)
1-element Array{Int64,1}:
1
julia> findfirst(t -> occursin(x, t), y)
1

Re. partitions()

Why is
julia> collect(partitions(1,2))
0-element Array{Any,1}
returned instead of
2-element Array{Any,1}:
[0,1]
[1,0]
and do I really have to
x = collect(partitions(n,m));
y = Array(Int64,length(x),length(x[1]));
for i in 1:length(x)
for j in 1:length(x[1])
y[i,j] = x[i][j];
end
end
to convert the result to a two-dimensional array?
From the wikipedia:
In number theory and combinatorics, a partition of a positive integer n, also called an integer partition, is a way of writing n as a sum of positive integers.
For array conversion, try:
julia> x = collect(partitions(5,3))
2-element Array{Any,1}:
[3,1,1]
[2,2,1]
or
julia> x = partitions(5,3)
Base.FixedPartitions(5,3)
then
julia> hcat(x...)
3x2 Array{Int64,2}:
3 2
1 2
1 1
Here's another approach to your problem that I think is a little simpler, using the Combinatorics.jl library:
multisets(n, k) = map(A -> [sum(A .== i) for i in 1:n],
with_replacement_combinations(1:n, k))
This allocates a bunch of memory, but I think your current approach does too. Maybe it would be useful to make a first-class version and add it to Combinatorics.jl.
Examples:
julia> multisets(2, 1)
2-element Array{Array{Int64,1},1}:
[1,0]
[0,1]
julia> multisets(3, 5)
21-element Array{Array{Int64,1},1}:
[5,0,0]
[4,1,0]
[4,0,1]
[3,2,0]
[3,1,1]
[3,0,2]
[2,3,0]
[2,2,1]
[2,1,2]
[2,0,3]
⋮
[1,2,2]
[1,1,3]
[1,0,4]
[0,5,0]
[0,4,1]
[0,3,2]
[0,2,3]
[0,1,4]
[0,0,5]
The argument order is backwards from yours to match mathematical convention. If you prefer the other way, that can easily be changed.
one robust solution can be achieved using lexicographic premutations generation algorithm, originally By Donald Knuth plus classic partitions(n).
that is lexicographic premutations generator:
function lpremutations{T}(a::T)
b=Vector{T}()
sort!(a)
n=length(a)
while(true)
push!(b,copy(a))
j=n-1
while(a[j]>=a[j+1])
j-=1
j==0 && return(b)
end
l=n
while(a[j]>=a[l])
l-=1
end
tmp=a[l]
a[l]=a[j]
a[j]=tmp
k=j+1
l=n
while(k<l)
tmp=a[k]
a[k]=a[l]
a[l]=tmp
k+=1
l-=1
end
end
end
The above algorithm will generates all possible unique
combinations of an array elements with repetition:
julia> lpremutations([2,2,0])
3-element Array{Array{Int64,1},1}:
[0,2,2]
[2,0,2]
[2,2,0]
Then we will generate all integer arrays that sum to n using partitions(n) (forget the length of desired arrays m), and resize them to the lenght m using resize_!
function resize_!(x,m)
[x;zeros(Int,m-length(x))]
end
And main function looks like:
function lpartitions(n,m)
result=[]
for i in partitions(n)
append!(result,lpremutations(resize_!(i, m)))
end
result
end
Check it
julia> lpartitions(3,4)
20-element Array{Any,1}:
[0,0,0,3]
[0,0,3,0]
[0,3,0,0]
[3,0,0,0]
[0,0,1,2]
[0,0,2,1]
[0,1,0,2]
[0,1,2,0]
[0,2,0,1]
[0,2,1,0]
[1,0,0,2]
[1,0,2,0]
[1,2,0,0]
[2,0,0,1]
[2,0,1,0]
[2,1,0,0]
[0,1,1,1]
[1,0,1,1]
[1,1,0,1]
[1,1,1,0]
The MATLAB script from http://www.mathworks.com/matlabcentral/fileexchange/28340-nsumk actually behaves the way I need, and is what I though that partitions() would do from the description given. The Julia version is
# k - sum, n - number of non-negative integers
function nsumk(k,n)
m = binomial(k+n-1,n-1);
d1 = zeros(Int16,m,1);
d2 = collect(combinations(collect((1:(k+n-1))),n-1));
d2 = convert(Array{Int16,2},hcat(d2...)');
d3 = ones(Int16,m,1)*(k+n);
dividers = [d1 d2 d3];
return diff(dividers,2)-1;
end
julia> nsumk(3,2)
4x2 Array{Int16,2}:
0 3
1 2
2 1
3 0
using daycaster's lovely hcat(x...) tidbit :)
I still wish there would be a more compact way of doing this.
The the first mention of this approach seem to be https://au.mathworks.com/matlabcentral/newsreader/view_thread/52610, and as far as I can understand it is based on the "stars and bars" method https://en.wikipedia.org/wiki/Stars_and_bars_(combinatorics)

Check if an Object is an Array or a Dict

I'd like to check if var is an Array or a Dict.
typeof(var) == Dict
typeof(var) == Array
But it doesn't work because typeof is too precise: Dict{ASCIIString,Int64}.
What's the best way ?
If you need a "less precise" check, you may want to consider using the isa() function, like this:
julia> d = Dict([("A", 1), ("B", 2)])
julia> isa(d, Dict)
true
julia> isa(d, Array)
false
julia> a = rand(1,2,3);
julia> isa(a, Dict)
false
julia> isa(a, Array)
true
The isa() function could then be used in control flow constructs, like this:
julia> if isa(d, Dict)
println("I'm a dictionary!")
end
I'm a dictionary!
julia> if isa(a, Array)
println("I'm an array!")
end
I'm an array!
Note: Tested with Julia 0.4.3
Instead of checking for a particular concrete type, such as Array, or Dict, you might do better by checking for the abstract types, and gain a lot of flexibility.
For example:
julia> x = [1,2,3]
3-element Array{Int64,1}:
1
2
3
julia> d = Dict(:a=>1,:b=>2)
Dict(:a=>1,:b=>2)
julia> isa(d, Associative)
true
julia> isa(x, AbstractArray)
true
There are many different types of arrays in Julia, so checking for Array is likely to be too restrictive, you won't get sparse matrices, for example.
There are also a number of different types of associative structures, Dict, ObjectIdDict, SortedDict, OrderedDict.

Nested list comprehensions in Julia

In python I can do nested list comprehensions, for instance I can flatten the following array thus:
a = [[1,2,3],[4,5,6]]
[i for arr in a for i in arr]
to get [1,2,3,4,5,6]
If I try this syntax in Julia I get:
julia> a
([1,2,3],[4,5,6],[7,8,9])
julia> [i for arr in a for i in arr]
ERROR: syntax: expected ]
Are nested list comprehensions in Julia possible?
This feature has been added in julia v0.5:
julia> a = ([1,2,3],[4,5,6],[7,8,9])
([1,2,3],[4,5,6],[7,8,9])
julia> [i for arr in a for i in arr]
9-element Array{Int64,1}:
1
2
3
4
5
6
7
8
9
List comprehensions work a bit differently in Julia:
> [(x,y) for x=1:2, y=3:4]
2x2 Array{(Int64,Int64),2}:
(1,3) (1,4)
(2,3) (2,4)
If a=[[1 2],[3 4],[5 6]] was a multidimensional array, vec would flatten it:
> vec(a)
6-element Array{Int64,1}:
1
2
3
4
5
6
Since a contains tuples, this is a bit more complicated in Julia. This works, but likely isn't the best way to handle it:
function flatten(x, y)
state = start(x)
if state==false
push!(y, x)
else
while !done(x, state)
(item, state) = next(x, state)
flatten(item, y)
end
end
y
end
flatten(x)=flatten(x,Array(Any, 0))
Then, we can run:
> flatten([(1,2),(3,4)])
4-element Array{Any,1}:
1
2
3
4
You can get some mileage out of using the splat operator with the array constructor here (transposing to save space)
julia> a = ([1,2,3],[4,5,6],[7,8,9])
([1,2,3],[4,5,6],[7,8,9])
julia> [a...]'
1x9 Array{Int64,2}:
1 2 3 4 5 6 7 8 9
Any reason why you're using a tuple of vectors? It's much simpler with arrays, as Ben has already shown with vec. But you can also use comprehensions pretty simply in either case:
julia> a = ([1,2,3],[4,5,6],[7,8,9]);
julia> [i for i in hcat(a...)]
9-element Array{Any,1}:
1
2
⋮
The expression hcat(a...) "splats" your tuple and concatenates it into an array. But remember that, unlike Python, Julia uses column-major array semantics. You have three column vectors in your tuple; is that what you intend? (If they were row vectors — delimited by spaces — you could just use [a...] to do the concatenation). Arrays are iterated through all elements, regardless of their dimensionality.
Don't have enough reputation for comment so posting a modification #ben-hammer. Thanks for the example of flatten(), it was helpful to me.
But it did break if the tuples/arrays contained strings. Since strings are iterables the function would further break them down to characters. I had to insert condition to check for ASCIIString to fix that. The code is below
function flatten(x, y)
state = start(x)
if state==false
push!(y, x)
else
if typeof(x) <: String
push!(y, x)
else
while (!done(x, state))
(item, state) = next(x, state)
flatten(item, y)
end
end
end
y
end
flatten(x)=flatten(x,Array(Any, 0))

Resources