The Go Programming Language Specification says:
3. The iteration order over maps is not specified. [...]
That's to be expected since a map type can be implemented as a hash table, or as a search tree, or as some other data structure. But how is map actually implemented in Go?
Put differently, what determines the iteration order of the keys in
for k, _ := range m { fmt.Println(k) }
I started wondering about this after I saw that a map with string keys apparently do have a certain iteration order. A program like
package main
import ("fmt"; "time"; "rand")
func main() {
rand.Seed(time.Seconds())
words := [...]string{"foo", "bar", "a", "b", "c", "hello", "world",
"0", "1", "10", "100", "123"}
stringMap := make(map[string]byte)
for i := range rand.Perm(len(words)) {
stringMap[words[i]] = byte(rand.Int())
}
fmt.Print("stringMap keys:")
for k, _ := range stringMap { fmt.Print(" ", k) }
fmt.Println()
}
prints the following on my machine:
stringMap keys: a c b 100 hello foo bar 10 world 123 1 0
regardless of the insertion order.
The equivalent program with a map[byte]byte map also prints the keys in a shuffled order, but here the key order depends on the insertion order.
How is all this implemented? Is the map specialized for integers and for strings?
Map is implemented in Go as a hashmap.
The Go run-time uses a common hashmap implementation which is implemented in C. The only implementation differences between map[string]T and map[byte]T are: hash function, equivalence function and copy function.
Unlike (some) C++ maps, Go maps aren't fully specialized for integers and for strings.
In Go release.r60, the iteration order is independent from insertion order as long as there are no key collisions. If there are collisions, iteration order is affected by insertion order. This holds true regardless of key type. There is no difference between keys of type string and keys of type byte in this respect, so it is only a coincidence that your program always printed the string keys in the same order. The iteration order is always the same unless the map is modified.
However, in the newest Go weekly release (and in Go1 which may be expected to be released this month), the iteration order is randomized (it starts at a pseudo-randomly chosen key, and the hashcode computation is seeded with a pseudo-random number). If you compile your program with the weekly release (and with Go1), the iteration order will be different each time you run your program. That said, running your program an infinite number of times probably wouldn't print all possible permutations of the key set. Example outputs:
stringMap keys: b 0 hello c world 10 1 123 bar foo 100 a
stringMap keys: hello world c 1 10 bar foo 123 100 a b 0
stringMap keys: bar foo 123 100 world c 1 10 b 0 hello a
...
If the specs say the iteration order is not specified then a specific order in specific cases is not ruled out.
The point is one cannot rely on that order in any case, not even in some special case. The implementation is free to change this behavior at any given moment, run time included.
Note that it is not that odd for order to be stable regardless of insertion order if there is a total order over the keys (as there frequently may be if they are of a homogenous type); if nothing else, it can allow efficient searching over keys which generate the same hash.
This may well also reflect a different underlying implementation - in particular, this is something people might want for strings, but for integers you could use a sparse array instead.
To extend #user811773 answer. A semi-random range clause iteration does not mean that the chance of returning each element in a map is also equal. See https://medium.com/i0exception/map-iteration-in-go-275abb76f721 and https://play.golang.org/p/GpSd8i7XZoG.
package main
import "fmt"
type intSet map[int]struct{}
func (s intSet) put(v int) { s[v] = struct{}{} }
func (s intSet) get() (int, bool) {
for k := range s {
return k, true
}
return 0, false
}
func main() {
s := make(intSet)
for i := 0; i < 4; i++ {
s.put(i)
}
counts := make(map[int]int)
for i := 0; i < 1024*1024; i++ {
v, ok := s.get()
if !ok {return}
counts[v]++
}
for k, v := range counts {
fmt.Printf("Value: %v, Count: %v\n", k, v)
}
}
/*
Value: 1, Count: 130752
Value: 2, Count: 130833
Value: 0, Count: 655840
Value: 3, Count: 131151
*/
Related
What's the difference about below ?
type Demo struct {s string}
func getDemo1()([]*Demo) // 1
func getDemo2()([]Demo) // 2
Is there any memory difference between getDemo1 and getDemo2?
I'm going to answer this, despite my better judgement to just send OP to the tour and documentation/specification. Mostly because of this:
Is there any memory difference between getDemo1 and getDemo2?
The answer to this specific question depends on how you utilize the slice. Go is Pass by value, so passing struct values around copies them. For instance, consider the following example.
https://play.golang.org/p/VzjYXwUy0EI
d1 := getDemo1()
d2 := getDemo2()
for _, v := range d1 {
// v is of type *Demo, so this modifies the value in the slice
v.s = "same"
}
fmt.Println(d1)
for _, v := range d2 {
// v is of type Demo, and is a COPY of the struct in the slice, so the original is not modified
v.s = "same"
}
So as to the memory question, obviously using *Demo, which returns a copy of the pointer in the range (effectively a uint64) as opposed to returning a copy of a Demo (the entire struct and all it's fields) would use less memory. BUT, you can still index directly to the array to avoid copies, except when you pass individual items in the slice around.
That said, passing the slice itself around, the two types have no difference in overhead. A slice is an abstraction of an array, and the slice itself that gets passed around is merely a slice header, which would be the same memory footprint regardless what type the slice contains.
BTW, the paradigm for modifying the values in the case of []Demo is:
for i, _ := range d2 {
d2[i].s = "same"
}
First, my definition of composite key - two ore more values combine to make the key. Not to confuse with composite keys in databases.
My goal is to save computed values of pow(x, y) in a hash table, where x and y are integers. This is where I need ideas on how to make a key, so that given x and y, I can look it up in the hash table, to find pow(x,y).
For example:
pow(2, 3) => {key(2,3):8}
What I want to figure out is how to get the map key for the pair (2,3), i.e. the best way to generate a key which is a combination of multiple values, and use it in hash table.
The easiest and most flexible way is to use a struct as the key type, including all the data you want to be part of the key, so in your case:
type Key struct {
X, Y int
}
And that's all. Using it:
m := map[Key]int{}
m[Key{2, 2}] = 4
m[Key{2, 3}] = 8
fmt.Println("2^2 = ", m[Key{2, 2}])
fmt.Println("2^3 = ", m[Key{2, 3}])
Output (try it on the Go Playground):
2^2 = 4
2^3 = 8
Spec: Map types: You may use any types as the key where the comparison operators == and != are fully defined, and the above Key struct type fulfills this.
Spec: Comparison operators: Struct values are comparable if all their fields are comparable. Two struct values are equal if their corresponding non-blank fields are equal.
One important thing: you should not use a pointer as the key type (e.g. *Key), because comparing pointers only compares the memory address, and not the pointed values.
Also note that you could also use arrays (not slices) as key type, but arrays are not as flexible as structs. You can read more about this here: Why have arrays in Go?
This is how it would look like with arrays:
type Key [2]int
m := map[Key]int{}
m[Key{2, 2}] = 4
m[Key{2, 3}] = 8
fmt.Println("2^2 = ", m[Key{2, 2}])
fmt.Println("2^3 = ", m[Key{2, 3}])
Output is the same. Try it on the Go Playground.
Go can't make a hash of a slice of ints.
Therefore the way I would approach this is mapping a struct to a number.
Here is an example of how that could be done:
package main
import (
"fmt"
)
type Nums struct {
num1 int
num2 int
}
func main() {
powers := make(map[Nums]int)
numbers := Nums{num1: 2, num2: 4}
powers[numbers] = 6
fmt.Printf("%v", powers[input])
}
I hope that helps
Your specific problem is nicely solved by the other answers. I want to add an additional trick that may be useful in some corner cases.
Given that map keys must be comparable, you can also use interfaces. Interfaces are comparable if their dynamic values are comparable.
This allows you to essentially partition the map, i.e. to use multiple types of keys within the same data structure. For example if you want to store in your map n-tuples (it wouldn't work with arrays, because the array length is part of the type).
The idea is to define an interface with a dummy method (but it can surely be not dummy at all), and use that as map key:
type CompKey interface {
isCompositeKey() bool
}
var m map[CompKey]string
At this point you can have arbitrary types implementing the interface, either explicitly or by just embedding it.
In this example, the idea is to make the interface method unexported so that other structs may just embed the interface without having to provide an actual implementation — the method can't be called from outside its package. It will just signal that the struct is usable as a composite map key.
type AbsoluteCoords struct {
CompKey
x, y int
}
type RelativeCoords struct {
CompKey
x, y int
}
func foo() {
p := AbsoluteCoords{x: 1, y: 2}
r := RelativeCoords{x: 10, y: 20}
m[p] = "foo"
m[r] = "bar"
fmt.Println(m[AbsoluteCoords{x: 10, y: 20}]) // "" (empty, types don't match)
fmt.Println(m[RelativeCoords{x: 10, y: 20}]) // "bar" (matches, key present)
}
Of course nothing stops you from declaring actual methods on the interface, that may be useful when ranging over the map keys.
The disadvantage of this interface key is that it is now your responsibility to make sure the implementing types are actually comparable. E.g. this map key will panic:
type BadKey struct {
CompKey
nonComparableSliceField []int
}
b := BadKey{nil, []int{1,2}}
m[b] = "bad!" // panic: runtime error: hash of unhashable type main.BadKey
All in all, this might be an interesting approach when you need to keep two sets of K/V pairs in the same map, e.g. to keep some sanity in function signatures or to avoid defining structs with N very similar map fields.
Playground https://play.golang.org/p/0t7fcvSWdy7
As I understand it, I cannot define equality for user-defined types in Go. So what would be the idiomatic way of computing the number of distinct objects of some custom type (possibly recursively defined). Here is an example of the kind of thing I am trying to do.
package main
import "fmt"
type tree struct {
left *tree
right *tree
}
func shapeOf(a tree) string {
temp := "{"
if a.left != nil {
temp += shapeOf(*(a.left))
}
temp += "}{"
if a.right != nil {
temp += shapeOf(*(a.right))
}
temp += "}"
return temp;
}
func main() {
a := tree{nil, nil}
b := tree{nil, &a}
c := tree{nil, nil}
d := tree{nil, &c}
e := tree{nil, nil}
f := tree{&e, nil}
s := make(map[string]bool)
s[shapeOf(b)] = true
s[shapeOf(d)] = true
s[shapeOf(f)] = true
fmt.Println(len(s)) // As required, prints 2 because the first two trees have the same shape
}
It works, but the use of strings is extremely ugly, and probably inefficient too. Obviously I could easily write a recursive method to tell if two trees are equal - something like
func areEqual(a, b tree) bool
but this wouldn't enable me to use trees as map keys. What is the idiomatic Go way to do something like this?
You cannot define equality for user-defined type because it is already defined by go. Basically, all there is to know about it is explained in the comparable section.
Short story: two struct values can be compared if their fields can be compared (no slice, map or function). And same thing for equality: two structs are equal if their fields are equal. In your case, the problem is that for comparing pointers, Golang compares the memory addresses, not the struct they point to.
So, is this possible to count distinct values of a certain struct ? Yes, if the struct contain no nested slice, map, function or pointer. For recursive types, that's not possible because you cannot define something like this:
type tree struct {
left tree
right tree
}
The idiomatic way of testing the equality of recursive types is to use reflect.DeepEqual(t1, t2 interface{}) as it follows indirections. However, this method is inefficient because uses heavy reflection. In your case, I do not think there is any clean and elegant solution to get what you want.
How can one remove selected keys from a map?
Is it safe to combine delete() with range, as in the code below?
package main
import "fmt"
type Info struct {
value string
}
func main() {
table := make(map[string]*Info)
for i := 0; i < 10; i++ {
str := fmt.Sprintf("%v", i)
table[str] = &Info{str}
}
for key, value := range table {
fmt.Printf("deleting %v=>%v\n", key, value.value)
delete(table, key)
}
}
https://play.golang.org/p/u1vufvEjSw
This is safe! You can also find a similar sample in Effective Go:
for key := range m {
if key.expired() {
delete(m, key)
}
}
And the language specification:
The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next. If the map is nil, the number of iterations is 0.
Sebastian's answer is accurate, but I wanted to know why it was safe, so I did some digging into the Map source code. It looks like on a call to delete(k, v), it basically just sets a flag (as well as changing the count value) instead of actually deleting the value:
b->tophash[i] = Empty;
(Empty is a constant for the value 0)
What the map appears to actually be doing is allocating a set number of buckets depending on the size of the map, which grows as you perform inserts at the rate of 2^B (from this source code):
byte *buckets; // array of 2^B Buckets. may be nil if count==0.
So there are almost always more buckets allocated than you're using, and when you do a range over the map, it checks that tophash value of each bucket in that 2^B to see if it can skip over it.
To summarize, the delete within a range is safe because the data is technically still there, but when it checks the tophash it sees that it can just skip over it and not include it in whatever range operation you're performing. The source code even includes a TODO:
// TODO: consolidate buckets if they are mostly empty
// can only consolidate if there are no live iterators at this size.
This explains why using the delete(k,v) function doesn't actually free up memory, just removes it from the list of buckets you're allowed to access. If you want to free up the actual memory you'll need to make the entire map unreachable so that garbage collection will step in. You can do this using a line like
map = nil
I was wondering if a memory leak could happen. So I wrote a test program:
package main
import (
log "github.com/Sirupsen/logrus"
"os/signal"
"os"
"math/rand"
"time"
)
func main() {
log.Info("=== START ===")
defer func() { log.Info("=== DONE ===") }()
go func() {
m := make(map[string]string)
for {
k := GenerateRandStr(1024)
m[k] = GenerateRandStr(1024*1024)
for k2, _ := range m {
delete(m, k2)
break
}
}
}()
osSignals := make(chan os.Signal, 1)
signal.Notify(osSignals, os.Interrupt)
for {
select {
case <-osSignals:
log.Info("Recieved ^C command. Exit")
return
}
}
}
func GenerateRandStr(n int) string {
rand.Seed(time.Now().UnixNano())
const letterBytes = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))]
}
return string(b)
}
Looks like GC do frees the memory. So it's okay.
In short, yes. See previous answers.
And also this, from here:
ianlancetaylor commented on Feb 18, 2015
I think the key to understanding this is to realize that while executing the body of a for/range statement, there is no current iteration. There is a set of values that have been seen, and a set of values that have not been seen. While executing the body, one of the key/value pairs that has been seen--the most recent pair--was assigned to the variable(s) of the range statement. There is nothing special about that key/value pair, it's just one of the ones that has already been seen during the iteration.
The question he's answering is about modifying map elements in place during a range operation, which is why he mentions the "current iteration". But it's also relevant here: you can delete keys during a range, and that just means that you won't see them later on in the range (and if you already saw them, that's okay).
In short: How do I traverse a map in sorted key order, regardless of the map's type?
I found a few related questions, the closest one suggesting that it can't be done without relying on the reflect module. Is this understanding correct?
Consider this Go code, which traverses two maps of different types, in sorted order of their keys:
mapOne := map[int]string {
1: "a",
2: "b",
3: "c",
}
keysOne := make([]int, 0, len(mapOne))
for key, _ := range mapOne {
keysOne = append(keysOne, key)
}
sort.Ints(keysOne)
for _, key := range keysOne {
value := mapOne[key]
fmt.Println(key, value)
}
mapTwo := map[string]int {
"a": 1,
"b": 2,
"c": 3,
}
keysTwo := make([]string, 0, len(mapTwo))
for key, _ := range mapTwo {
keysTwo = append(keysTwo, key)
}
sort.Strings(keysTwo)
for _, key := range keysTwo {
value := mapTwo[key]
fmt.Println(key, value)
}
The logic to extract the keys and then sort them is duplicated for the two
different map types. Is there any way to factor out this logic and avoid
duplication?
I got stuck trying to write an interface to provide a SortedKeys method. In
particular, the return type of SortedKeys depends on the type of the map,
and I can't figure out how to express that in Go.
I think whoever told you you'd need reflect was correct; that's probably overkill though. I think the duplication is acceptable here.
(alternatively, you could implement your own map that uses some kind of interface for keys, but you'd still end up needing to make a type that satisfies the interface for each underlying key type)