How many levels does an arbitrary Tcl dicionary have? - dictionary

Let us say that I have a dictionary in tcl:
set dicty [dict create 1 2 4 "5 6"]
So, in the mind of the person creating the dictionary there is no nesting. Value of 4 is a string. Not according to the tcl engine...: See shell result
puts [ dict get [ dict get $dicty 4 ] 5 ]
> 6
Which means that the string got auto-converted to a dictionary. Needless to say, in languages such as Perl, Ruby, Python, this is easily tackled.
Any ideas?

Both the keys and the value in Tcl's dictionaries are arbitrary values, and so may be dictionaries. It's not very useful to use dictionaries as keys, but nested dictionaries are fully supported (and may be nested very deep; I've made it to over a million deep, though this isn't recommended for comprehensibility reasons!):
set theDict {a b c {d e f {g h} i j} k l}
puts [dict get $theDict "c" "f" "g"]
# You can also use multiple keys with [dict set] and [dict exists]
Tcl does not distinguish at the formal type level between strings, lists and dictionaries. This is by design. All Tcl values are fully serializable by default, and will be automatically converted between types behind the scenes as necessary, according to what operations were used on them last.
Write what you mean to say and stop worrying about the types: they'll come right for you.

Related

ApacheArrow for Trie Data

I am seeking on using Arrow for read-heavy operations on trie data structures. I'm slightly hesitant with using Arrow since I can't really see a natural representation of the data in terms of columns. Specifically, the data I work with can be viewed as a trie where the keys are tuples of strings, ints, etc and all the values are at the leaves. An example:
.
a / \ 1
. .
2 / \ c \ 3
3.0 "hi" [arr of int]
The key set for any given trie can vary. Indeed, I am actually dealing with many different tries, each with slightly different keysets and corresponding leaf values.
The end goal would be to have
a) A means to read slices of trie into memory without loading everything into memory
b) The ability to reconstruct tries (can be expensive) if needed.
I should mention that I am considering HDF5 as an alternative. If it is important, I am working in Julia.

Elixir allow assign to variable two times [duplicate]

In Dave Thomas's book Programming Elixir he states "Elixir enforces immutable data" and goes on to say:
In Elixir, once a variable references a list such as [1,2,3], you know it will always reference those same values (until you rebind the variable).
This sounds like "it won't ever change unless you change it" so I'm confused as to what the difference between mutability and rebinding is. An example highlighting the differences would be really helpful.
Don't think of "variables" in Elixir as variables in imperative languages, "spaces for values". Rather look at them as "labels for values".
Maybe you would better understand it when you look at how variables ("labels") work in Erlang. Whenever you bind a "label" to a value, it remains bound to it forever (scope rules apply here of course).
In Erlang you cannot write this:
v = 1, % value "1" is now "labelled" "v"
% wherever you write "1", you can write "v" and vice versa
% the "label" and its value are interchangeable
v = v+1, % you can not change the label (rebind it)
v = v*10, % you can not change the label (rebind it)
instead you must write this:
v1 = 1, % value "1" is now labelled "v1"
v2 = v1+1, % value "2" is now labelled "v2"
v3 = v2*10, % value "20" is now labelled "v3"
As you can see this is very inconvenient, mainly for code refactoring. If you want to insert a new line after the first line, you would have to renumber all the v* or write something like "v1a = ..."
So in Elixir you can rebind variables (change the meaning of the "label"), mainly for your convenience:
v = 1 # value "1" is now labelled "v"
v = v+1 # label "v" is changed: now "2" is labelled "v"
v = v*10 # value "20" is now labelled "v"
Summary: In imperative languages, variables are like named suitcases: you have a suitcase named "v". At first you put sandwich in it. Than you put an apple in it (the sandwich is lost and perhaps eaten by the garbage collector). In Erlang and Elixir, the variable is not a place to put something in. It's just a name/label for a value. In Elixir you can change a meaning of the label. In Erlang you cannot. That's the reason why it doesn't make sense to "allocate memory for a variable" in either Erlang or Elixir, because variables do not occupy space. Values do. Now perhaps you see the difference clearly.
If you want to dig deeper:
1) Look at how "unbound" and "bound" variables work in Prolog. This is the source of this maybe slightly strange Erlang concept of "variables which do not vary".
2) Note that "=" in Erlang really is not an assignment operator, it's just a match operator! When matching an unbound variable with a value, you bind the variable to that value. Matching a bound variable is just like matching a value it's bound to. So this will yield a match error:
v = 1,
v = 2, % in fact this is matching: 1 = 2
3) It's not the case in Elixir. So in Elixir there must be a special syntax to force matching:
v = 1
v = 2 # rebinding variable to 2
^v = 3 # matching: 2 = 3 -> error
Immutability means that data structures don't change. For example the function HashSet.new returns an empty set and as long as you hold on to the reference to that set it will never become non-empty. What you can do in Elixir though is to throw away a variable reference to something and rebind it to a new reference. For example:
s = HashSet.new
s = HashSet.put(s, :element)
s # => #HashSet<[:element]>
What cannot happen is the value under that reference changing without you explicitly rebinding it:
s = HashSet.new
ImpossibleModule.impossible_function(s)
s # => #HashSet<[:element]> will never be returned, instead you always get #HashSet<[]>
Contrast this with Ruby, where you can do something like the following:
s = Set.new
s.add(:element)
s # => #<Set: {:element}>
Erlang and obviously Elixir that is built on top of it, embraces immutability.
They simply don’t allow values in a certain memory location to change. Never Until the variable gets garbage collected or is out of scope.
Variables aren't the immutable thing. The data they point to is the immutable thing. That's why changing a variable is referred to as rebinding.
You're point it at something else, not changing the thing it points to.
x = 1 followed by x = 2 doesn't change the data stored in computer memory where the 1 was to a 2. It puts a 2 in a new place and points x at it.
x is only accessible by one process at a time so this has no impact on concurrency and concurrency is the main place to even care if something is immutable anyway.
Rebinding doesn’t change the state of an object at all, the value is still in the same memory location, but it’s label (variable) now points to another memory location, so immutability is preserved. Rebinding is not available in Erlang, but while it is in Elixir this is not braking any constraint imposed by the Erlang VM, thanks to its implementation.
The reasons behind this choice are well explained by Josè Valim in this gist .
Let's say you had a list
l = [1, 2, 3]
and you had another process that was taking lists and then performing "stuff" against them repeatedly and changing them during this process would be bad. You might send that list like
send(worker, {:dostuff, l})
Now, your next bit of code might want to update l with more values for further work that's unrelated to what that other process is doing.
l = l ++ [4, 5, 6]
Oh no, now that first process is going to have undefined behavior because you changed the list right? Wrong.
That original list remains unchanged. What you really did was make a new list based on the old one and rebind l to that new list.
The separate process never has access to l. The data l originally pointed at is unchanged and the other process (presumably, unless it ignored it) has its own separate reference to that original list.
What matters is you can't share data across processes and then change it while another process is looking at it. In a language like Java where you have some mutable types (all primitive types plus references themselves) it would be possible to share a structure/object that contained say an int and change that int from one thread while another was reading it.
In fact, it's possible to change a large integer type in java partially while it's read by another thread. Or at least, it used to be, not sure if they clamped that aspect of things down with the 64 bit transition. Anyway, point is, you can pull the rug out from under other processes/threads by changing data in a place that both are looking at simultaneously.
That's not possible in Erlang and by extension Elixir. That's what immutability means here.
To be a bit more specific, in Erlang (the original language for the VM Elixir runs on) everything was single-assignment immutable variables and Elixir is hiding a pattern Erlang programmers developed to work around this.
In Erlang, if a=3 then that was what a was going to be its value for the duration of that variable's existence until it dropped out of scope and was garbage collected.
This was useful at times (nothing changes after assignment or pattern match so it is easy to reason about what a function is doing) but also a bit cumbersome if you were doing multiple things to a variable or collection over the course executing a function.
Code would often look like this:
A=input,
A1=do_something(A),
A2=do_something_else(A1),
A3=more_of_the_same(A2)
This was a bit clunky and made refactoring more difficult than it needed to be. Elixir is doing this behind the scenes, but hiding it from the programmer via macros and code transforms performed by the compiler.
Great discussion here
immutability-in-elixir
The variables really are immutable in sense, every new rebinding (assignment) is only visible to access that come after that. All previous access, still refer to old value(s) at the time of their call.
foo = 1
call_1 = fn -> IO.puts(foo) end
foo = 2
call_2 = fn -> IO.puts(foo) end
foo = 3
foo = foo + 1
call_3 = fn -> IO.puts(foo) end
call_1.() #prints 1
call_2.() #prints 2
call_3.() #prints 4
To make it a very simple
variables in elixir are not like container where you keep adding and removing or modifying items from the container.
Instead they are like Labels attached to a container, when you reassign a variable is as simple a you pick a label from one container and place it on a new container with expected data in it.

dict get of dict from nested dictionary, creates a copy in tcl?

Let look at the sample code below:
set m [ dict create 1 [ dict create 2 3] 4 [ dict create 5 6 ] ]
set p [ dict get $m 4 ]
Now, here is the question. Assuming that I make no changes to the p.
Is TCL creating another copy, or is p just a pointer?Thanks.
Tcl's semantic model is of immutable values, i.e., when you are looking at a value, nothing that happens elsewhere in the world makes that value change. (Variables can change, but that's by putting a different value into them.) A consequence of this is that Tcl can aggressively share references to values. This means that its collection values (lists and dictionaries) hold these references efficiently, and the dict get operation will simply copy a reference out; the actual value itself exists in neither place, but the dictionary and the value both have handles for it. It looks just like a copy, but it's more efficient.
There is an additional nuance: when a variable has the only reference to a value, operations on the variable can directly modify the value instead of having to copy. That's very much not something you can see however, except that it boosts performance.
No, it doesn't create a new copy until there is a change through one of the "owners" of the object.
But p still isn't a pointer. For all programming intents and purposes, it's just a plain variable holding av value.

values and keys guaranteed to be in the consistent order?

When applied to a Dict, will values(...) and keys(...) return items in matching order?
In other words, is zip(keys(d), values(d)) guaranteed to contain precisely the key-value pairs of the dictionary d?
Option 1
The current Julia source code indicates that the keys and vals of a Dict() object are stored as Array objects, which are ordered. Thus, you could just use values() and keys() separately, as in your question formulation. But, it is dangerous to rely on under the hood implementation details that aren't documented, since they might be changed without notice.
Option 2
An OrderedDict from the DataStructures package (along with the functions values() and keys()) is probably the simplest and safest way to be certain of consistent ordering. It's ok if you don't specifically need the ordering.
Option 3
If you don't want to deal with the added hassle of installing and loading the DataStructures package, you could just use Julia's built in syntax for handling this kind of thing, e.g.
Mydict = Dict("a" => 1, "b" => 2, "c" => 1)
a = [(key, val) for (key, val) in Mydict]
The use of zip() as given in the question formulation just adds complexity and risk in this situation.
If you want the entities separate, you could then use:
Keys = [key for (key, val) in Mydict]
Values = [val for (key, val) in Mydict]
or just refer to a[idx][1] for the idx element of Keys when you need it.
Currently your assertion seems to be true:
julia> let
d = [i => i^2 for i in 1:10_000]
z = zip(keys(d), values(d))
for (pair, tupl) in zip(d, z)
#assert pair[1] == tupl[1] && pair[2] == tupl[2]
end
info("Success")
end
INFO: Success
But that is an undocumented implementation detail as Michael Ohlrogge explains.
Stefan Karpinski comment about show(dict) now sorted by key in #16743:
This has performance implications for printing very large Dicts. I don't think it's a good idea. I do, however, think that making Dict ordered is a good idea that we should go ahead with.
See also:
#10116 WIP: try ordered Dict representation.
Most importantly, what are you trying to do? Perhaps an OrederedDict is what you need?
Yes, keys and values return items in matching order. Unless, as Dan Getz pointed out above, the dictionary is modified in between using the two iterators.
I think it would be relatively perverse for a dictionary not to have this behavior. It was obvious to us that the order should match, to the point that it didn't even occur to us to mention this explicitly in the documentation.
Another way to ensure corresponding order between keys and values is using imap from the Iterators package in the following way:
using Iterators
d = Dict(1=>'a',2=>'b',3=>'c')
# keys iterator is `imap(first,d)`
julia> collect(imap(first,d))
3-element Array{Any,1}:
2
3
1
# values iterator is `imap(last,d)`
julia> collect(imap(last,d))
3-element Array{Any,1}:
'b'
'c'
'a'
This method can potentially be adapted for other structures. All the other comments and answers are also good.

Can I look upp a key in a tcl dict when having the value? Reverse lookup

I have an tcl dict containing keys and their values. Is there a way to do a "reverse lookup" eg looking for the value, and retrieving the key.
I know that this does not sound like state of the art programming, but the dict already exists in my code and I would not like to recreate it the other way around just because of this one time I need it.
You can use the dict filter command. Values can repeat, so you should expect more than one single key.
set d [dict create a b c d e b]
# note that the "b" value is repeated for keys "a" and "e"
dict filter $d value b
-> a b e b
So you can use something like this:
set lookupVal b
dict for {k v} [dict filter $d value $lookupVal] {
lappend keys $k
}
puts $keys
-> a e
If your values are unique, the dirty-hack way to get the key for a particular value is this:
set theKey [dict get [lreverse $theDict] $theValue]
I wouldn't particularly recommend it as it's a type-buster, but it will do the right thing. (If you've got the same value several times, this will return the key for the first instance.)
Note that you still retain the original dictionary in $theDict if you do this. (Well, assuming you're satisfied with Tcl's detailed type semantics when behind-the-scenes type conversions are happening.)
If you're thinking about doing this more than very occasionally, consider keeping a reversed dictionary around alongside the original so that you can do the lookup rapidly. This applies even if you use the dict keys [dict filter …] solution; linear scans of anything large can definitely slay you otherwise.

Resources