The following code serves to create a counter for each pair of float64.
Because the keys of a map cannot be a slice, I have to use arrays as keys, which forces me to to define a dimension with a constant.
counter := make( map[ [2]float64 ] int )
for _, comb := range combinations{ //combinations is a [n][2]float64
for _, row := range data{
counter[ [...]float64{ row[comb[0]] , row[comb[1]] } ]++
}
}
Having said that, is there a way to make this map dependent on the length of the keys (dependent on the dimensions of combinations?
I tried using a struct as key, but as far as I remember (I might be wrong), it was a bit slower... For my purposes (to apply this for all combinations ~n!) this is not the ideal solution.
Right now I'm only considering combinations of size 2 and 3, and I had to split this in two separate functions, which makes my code very verbose and harder to maintain.
Can you find a way to simplify this, so I can scale it to more dimensions?
Thanks for any input
Why not use the pointer to a slice as key?
You could create a slice with make with a big enough capacity, while you do not surpass it's capacity, the pointer will remain the same.
Take a look here https://play.golang.org/p/333tRMpBLv , it exemplifies my suggestion. see that while len < cap the pointer of the slice is not changed, append only creates a new slice when len exceeds cap.
Related
I have a slice that contains pointers to values. In a performance-critical part of my program, I'm adding or removing values from this slice. For the moment, inserting a value is just an append (O(1) complexity), and removal consists in searching the slice for the corresponding pointer value, from 0 to n-1, until the pointer is found (O(n)). To improve performance, I'd like to sort values in the slice, so that searching can be done using dichotomy (so O(log(n)).
But how can I compare pointer values? Pointer arithmetic is forbidden in go, so AFAIK to compare pointer values p1 and p2 I have to use the unsafe package and do something like
uintptr(unsafe.Pointer(p1)) < uintptr(unsafe.Pointer(p2))
Now, I'm not comfortable using unsafe, at least because of its name. So, is that method correct? Is it portable? Are there potential pitfalls? Is there a better way to define an order on pointer values? I know I could use maps, but maps are slow as hell.
As said by others, don't do this. Performance can't be that critical to resort to pointer arithmetic in Go.
Pointers are comparable, Spec: Comparison operators:
Pointer values are comparable. Two pointer values are equal if they point to the same variable or if both have value nil. Pointers to distinct zero-size variables may or may not be equal.
Just use a map with the pointers as keys. Simple as that. Yes, indexing maps is slower than indexing slices, but then again, if you'd want to keep your slice sorted and you wanted to perform binary searches in that, then the performance gap decreases, as the (hash) map implementation provides you O(1) lookup while binary search is only O(log n). In case of big data set, the map might even be faster than searching in the slice.
If you anticipate a big number of pointers in the map, then pre-allocate a big one with make() passing an estimated upper size, and until your map exceeds this size, no reallocation will occur.
m := make(map[*mytype]struct{}, 1<<20) // Allocate map for 1 million entries
This link: http://research.swtch.com/godata
It says (third paragraph of section Slices):
Because slices are multiword structures, not pointers, the slicing
operation does not need to allocate memory, not even for the slice
header, which can usually be kept on the stack. This representation
makes slices about as cheap to use as passing around explicit pointer
and length pairs in C. Go originally represented a slice as a pointer
to the structure shown above, but doing so meant that every slice
operation allocated a new memory object. Even with a fast allocator,
that creates a lot of unnecessary work for the garbage collector, and
we found that, as was the case with strings above, programs avoided
slicing operations in favor of passing explicit indices. Removing the
indirection and the allocation made slices cheap enough to avoid
passing explicit indices in most cases.
What...? Why does it not allocate any memory? If it is a multiword structure or a pointer? Does it not need to allocate memory? Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now? Very confused
To expand on Pravin Mishra's answer:
the slicing operation does not need to allocate memory.
"Slicing operation" refers to things like s1[x:y] and not slice initialization or make([]int, x). For example:
var s1 = []int{0, 1, 2, 3, 4, 5} // <<- allocates (or put on stack)
s2 := s1[1:3] // <<- does not (normally) allocate
That is, the second line is similar to:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
…
example := SliceHeader{&s1[1], 2, 5}
Usually local variables like example get put onto the stack. It's just like if this was done instead of using a struct:
var exampleData uintptr
var exampleLen, exampleCap int
Those example* variables go onto the stack.
Only if the code does return &example or otherFunc(&example) or otherwise allows a pointer to this to escape will the compiler be forced to allocate the struct (or slice header) on the heap.
Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now?
Imagine that instead of the above you did:
example2 := &SliceHeader{…same…}
// or
example3 := new(SliceHeader)
example3.Data = …
example3.Len = …
example3.Cap = …
i.e. the type is *SliceHeader rather than SliceHeader.
This is effectively what slices used to be (pre Go 1.0) according to what you mention.
It also used to be that both example2 and example3 would have to be allocated on the heap. That is the "memory for a new object" being refered to. I think that now escape analysis will try and put both of these onto the stack as long as the pointer(s) are kept local to the function so it's not as big of an issue anymore. Either way though, avoiding one level of indirection is good, it's almost always faster to copy three ints compared to copying a pointer and dereferencing it repeatedly.
Every data type allocates memory when it's initialized. In blog, he clearly mention
the slicing operation does not need to allocate memory.
And he is right. Now see, how slice works in golang.
Slices hold references to an underlying array, and if you assign one
slice to another, both refer to the same array. If a function takes a
slice argument, changes it makes to the elements of the slice will be
visible to the caller, analogous to passing a pointer to the
underlying array.
Is it possible to convert a pointer to certain value to a slice?
For example, I want to read single byte from io.Reader into uint8 variable. io.Reader.Read accepts a slice as its argument, so I cannot simply provide it a pointer to my variable as I'd do in C.
I think that creating a slice of length 1, capacity 1 from a pointer is safe operation. Obviously, it should be the same as creating a slice from an array of length 1, which is allowed operation. Is there an easy way to do this with plain variable? Or maybe I do not understand something and there are reasons why this is prohibited?
A slice is not only a pointer, like an array in C. It also contains the length and capacity of the data, like this:
struct {
ptr *uint8
len int
cap int
}
So, yes, you will need to create a slice. Simplest way to create a slice of the var a uint8 would be []uint8{a}
a := uint8(42)
fmt.Printf("%#v\n", []uint8{a})
(But after rereading your question, this is not a solution as all)
But if you wish to create the slice from the variable, pointing to the same space of memory, you could use the unsafe package. This is most likely to be discouraged.
fmt.Printf("%#v\n", (*[1]uint8)(unsafe.Pointer(&a))[:] )
Instead of (over)complicating this trivial task, why not to use the simple solution? I.e. pass .Read a length-1 slice and then assign its zeroth element to your variable.
I found a way to overcome my case when I want to supply a variable to io.Reader. Go standard library is wonderful!
import (
"io"
"encoding/binary"
)
...
var x uint8
binary.Read(reader, LittleEndian, &x)
As a side effect this works for any basic type and even for some non-basic.
Edit: Jeremy Wall helped me realize I had asked a question more specific than I intended; here's a better version.
Say I want to represent a table associating of values of some type B to sequences of values of some type A for which equality is defined. What is the best way to do that in Go?
Obviously for the table I'd want to use a Go map, but what can I use for the sequences of values of type A? Slices cannot be used as keys for maps in Go; arrays can, but the length of an array is a part of it's type and I'm interested in being able to use sequences of length determined at runtime. I could (1) use arrays of A declaring a maximum length for them or (2) use slices of A, serialize them to strings for use as keys (this technique is familiar to Awk and Lua programmers...). Is there a better work around for this "feature" of Go than the ones I've described?
As pointed out by Jeremy Wall in answer to my original version of the question, where I had A = int, option (2) is pretty good for integers, since you can use slices of runes for which conversion to string is just a cast.
Will a sequence of rune instead of integers work for you? runes are uint32 and the conversion to a string is just a cast:
package main
import "fmt"
type myKey struct {
seq []int
}
func main() {
m := make(map[string]string)
key := []rune{1, 2}
m[string(key)] = "foo"
fmt.Print("lookup: ", m[string(key)])
}
You can play with this code here: http://play.golang.org/p/Kct1dum8A0
Does anyone know how to do this and what the pseudo code would look like?
As we all know a hash table stores key,value pairs and when a key is a called, the function will return the value associated with that key. What I want to do is understand the underlying structure in creating that mapping function. For example, if we lived in a world where there were no previously defined functions except for arrays, how could we replicate the Hashmaps that we have today?
Actually, some of todays Hashmap implentations are indeed made out of arrays as you propose. Let me sketch how this works:
Hash Function
A hash function transforms your keys into an index for the first array (array K). A hash function such as MD5 or a simpler one, usually including a modulo operator, can be used for this.
Buckets
A simple array-based Hashmap implementation could use buckets to cope with collissions. Each element ('bucket') in array K contains itself an array (array P) of pairs. When adding or querying for an element, the hash function points you to the correct bucket in K, which contains your desired array P. You then iterate over the elements in P until you find a matching key, or you assign a new element at the end of P.
Mapping keys to buckets using the Hash
You should make sure that the number of buckets (i.e. the size of K) is a power of 2, let's say 2^b. To find the correct bucket index for some key, compute Hash(key) but only keep the first b bits. This is your index when cast to an integer.
Rescaling
Computing the hash of a key and finding the right bucket is very quick. But once a bucket becomes fuller, you will have to iterate more and more items before you get to the right one. So it is important to have enough buckets to properly distribute the objects, or your Hashmap will become slow.
Because you generally don't know how much objects you will want to store in the Hashmap in advance, it is desirable to dynamically grow or shrink the map. You can keep a count of the number of objects stored, and once it goes over a certain threshold you recreate the entire structure, but this time with a larger or smaller size for array K. In this way some of the buckets in K that were very full will now have their elements divided among several buckets, so that performance will be better.
Alternatives
You may also use a two-dimensional array instead of an array-of-arrays, or you may exchange array P for a linked list. Furthermore, instead of keeping a total count of stored objects, you may simply choose to recreate (i.e. rescale) the hashmap once one of the buckets contains more than some configured number of items.
A variation of what you are asking is described as 'array hash table' in the Hash table Wikipedia entry.
Code
For code samples, take a look here.
Hope this helps.
Could you be more precise? Does one array contain the keys, the other one the values?
If so, here is an example in Java (but there are few specificities of this language here):
for (int i = 0; i < keysArray.length; i++) {
map.put(keysArray[i], valuesArray[i]);
}
Of course, you will have to instantiate your map object (if you are using Java, I suggest to use a HashMap<Object, Object> instead of an obsolete HashTable), and also test your arrays in order to avoid null objects and check if they have the same size.
Sample Explanation:
At the below source, basically it does two things:
1. Map Representation
Some (X number of List) of lists
X being 2 power N number of lists is bad. A (2 power N)-1, or (2 power N)+1, or a prime number is good.
Example:
List myhashmap [hash_table_size];
// an array of (short) lists
// if its long lists, then there are more collisions
NOTE: this is array of arrays, not two arrays (I can't see a possible generic hashmap, in a good way with just 2 arrays)
If you know Algorithms > Graph theory > Adjacency list, this looks exactly same.
2. Hash function
And the hash function converts string (input) to a number (hash value), which is index of an array
initialize the hash value to first char (after converted to int)
for each further char, left shift 4 bits, then add char (after converted to int)
Example,
int hash = input[0];
for (int i=1; i<input.length(); i++) {
hash = (hash << 4) + input[i]
}
hash = hash % list.size()
// list.size() here represents 1st dimension of (list of lists)
// that is 1st dimension size of our map representation from point #1
// which is hash_table_size
See at the first link:
int HTable::hash (char const * str) const
Source:
http://www.relisoft.com/book/lang/pointer/8hash.html
How does a hash table work?
Update
This is the Best source: http://algs4.cs.princeton.edu/34hash/
You mean like this?
The following is using Ruby's irb as an illustration:
cities = ["LA", "SF", "NY"]
=> ["LA", "SF", "NY"]
items = ["Big Mac", "Hot Fudge Sundae"]
=> ["Big Mac", "Hot Fudge Sundae"]
price = {}
=> {}
price[[cities[0], items[1]]] = 1.29
=> 1.29
price
=> {["LA", "Hot Fudge Sundae"]=>1.29}
price[[cities[0], items[0]]] = 2.49
=> 2.49
price[[cities[1], items[0]]] = 2.99
=> 2.99
price
=> {["LA", "Hot Fudge Sundae"]=>1.29, ["LA", "Big Mac"]=>2.49, ["SF", "Big Mac"]=>2.99}
price[["LA", "Big Mac"]]
=> 2.49