How to slice a map[string]int into chunks - dictionary

My objective is to take a map[string]int containing potentially up to a million entries and chunk it in sizes of up to 500 and POST the map to an external service. I'm newer to golang, so I'm tinkering in the Go Playground for now.
Any tips anyone has on how to improve the efficiency of my code base, please share!
Playground: https://play.golang.org/p/eJ4_Pd9X91c
The CLI output I'm seeing is:
original size 60
chunk bookends 0 20
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
chunk bookends 20 40
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
chunk bookends 40 60
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
The problem here is that while the chunk bookends are being calculated correctly, the x value is starting at 0 each time. I think I should expect it to start at the chunk bookend minimum, which would be 0, 20, 40, etc. How come the range is starting at zero each time?
Source:
package main
import (
"fmt"
"math/rand"
"strconv"
)
func main() {
items := make(map[string]int)
// Generate some fake data for our testing, in reality this could be 1m entries
for i := 0; i < 60; i ++ {
// int as strings are intentional here
items[strconv.FormatInt(int64(rand.Int()), 10)] = rand.Int()
}
// Create a map of just keys so we can easily chunk based on the numeric keys
i := 0
keys := make([]string, len(items))
for k := range items {
keys[i] = k
i++
}
fmt.Println("original size", len(keys))
//batchContents := make(map[string]int)
// Iterate numbers in the size batch we're looking for
chunkSize := 20
for chunkStart := 0; chunkStart < len(keys); chunkStart += chunkSize {
chunkEnd := chunkStart + chunkSize
if chunkEnd > len(items) {
chunkEnd = len(items)
}
// Iterate over the keys
fmt.Println("chunk bookends", chunkStart, chunkEnd)
for x := range keys[chunkStart:chunkEnd] {
fmt.Print(x, ",")
// Build the batch contents with the contents needed from items
// #todo is there a more efficient approach?
//batchContents[keys[i]] = items[keys[i]]
}
fmt.Println()
// #todo POST final batch contents
//fmt.Println(batchContents)
}
}

When you process a chunk:
for x := range keys[chunkStart:chunkEnd] {}
You are iterating over a slice, and having one iteration variable, it will be the slice index, not the element from the slice (at the given index). Hence it will always start at 0. (When you iterate over a map, first iteration variable is the key because there is no index there, and the second is the value associated with that key.)
Instead you want this:
for _, key := range keys[chunkStart:chunkEnd] {}
Also note that it's redundant to first collect the keys in a slice, and then process them. You may do that when iterating over the map once, at first. Just keep a variable counting the iterations to know when you reach the chunk size, which may be implicit if you use data structures that keeps this (e.g. the size of a keys batch slice).
For example (try it on the Go Playground):
chunkSize := 20
batchKeys := make([]string, 0, chunkSize)
process := func() {
fmt.Println("Batch keys:", batchKeys)
batchKeys = batchKeys[:0]
}
for k := range items {
batchKeys = append(batchKeys, k)
if len(batchKeys) == chunkSize {
process()
}
}
// Process last, potentially incomplete batch
if len(batchKeys) > 0 {
process()
}

Related

Go determine number of word occurences on a string slice

Having a hard time trying to figure out how can I count the number of apps or words on a slice using the go-lang code I made.
Hoping someone could help me figure out how to count the number of occurence?
https://play.golang.org/p/KvgI-lCz_c6
package main
import (
"fmt"
)
func main() {
apps := []string{"one", "two", "three", "one", "four"}
fmt.Println("apps:", apps)
o := CountOccurence(apps)
fmt.Println("=== o: ", o)
}
func CountOccurence(apps []string) map[string]int {
dict := make(map[string]int)
for k, v := range apps {
fmt.Println(k, v)
dict[v] = k
}
// fmt.Println("=== dict: ", dict)
return dict
}
Outputs the following
apps: [one two three one four]
0 one
1 two
2 three
3 one
4 four
=== o: map[four:4 one:3 three:2 two:1]
PS: go strings.Count only counts a string, not a []string.
What you currently do is you gather the different elements and you assign their index to them. If a word occurs multiple times, the highest index will be assigned to it.
As you stated, you want to count the words. So instead of the index, assign 1 for new words (first occurrence), and if it's already in the map, increment its value by 1.
Since you can index a map with a non-existing key, in which case the result is the zero value of the value type of the map, which is 0 for int, it will tell you it was found 0 times (so far), so you don't even have to check if a key is already in there, just go ahead and increment it:
dict[v]++
So CountOccurrences() may look like this:
func CountOccurence(apps []string) map[string]int {
dict := make(map[string]int)
for _, v := range apps {
fmt.Println(v)
dict[v]++
}
return dict
}
Which will output (try it on the Go Playground):
apps: [one two three one four]
one
two
three
one
four
=== o: map[four:1 one:2 three:1 two:1]

A faster or slower way to clear truncated pointers?

In a truncate implementation I've read recently, the author uses the following way to clear the truncated items:
var nilItems = make(items, 16)
func (s *items) truncate(index int) {
var toClear items
*s, toClear = (*s)[:index], (*s)[index:]
for len(toClear) > 0 {
toClear = toClear[copy(toClear, nilItems):]
}
}
When I need to clear unwanted items, I would just iterate over the slice and set items to nil one by one.
I have set up a simple benchmark and it seems that the for loop way is faster.
I wonder what's the benefit of clearing with copy in bulk.
As mentioned by #MartinGallagher, your loop is recognized and optimized by the compiler, while the copy() version does "too much stuff" and is not optimized.
If you change your examples to fill with a non-nil pointer value, you'll see the loop version falls behind. Also don't allocate (make()) inside the benchmark loop, do that outside, and use b.ResetTimer() to exclude that time.
You also have a very small slice, if you increase its size, the difference will be more noticable:
var x = new(int)
func BenchmarkSetNilOneByOne(b *testing.B) {
nums := make([]*int, 12800)
b.ResetTimer()
for i := 0; i < b.N; i++ {
for i := range nums {
nums[i] = x
}
}
}
func BenchmarkSetNilInBulk(b *testing.B) {
nils := make([]*int, 128)
for i := range nils {
nils[i] = x
}
orig := make([]*int, 12800)
var nums []*int
b.ResetTimer()
for i := 0; i < b.N; i++ {
nums = orig
for len(nums) > 0 {
nums = nums[copy(nums, nils):]
}
}
}
Benchmark results:
BenchmarkSetNilOneByOne-4 96571 10626 ns/op
BenchmarkSetNilInBulk-4 266690 4023 ns/op
Also note that your "bulk" version also assigns slice headers to nums several times. There is a faster way to fill the slice: you do not need an additional "nils" slice, just start filling your slice, and you may copy the already filled part to the unfilled part. This also doesn't require to change / reassign to the nums slice header. See Is there analog of memset in go?

How to randomly split a map in Go as evenly as possible?

I have a quick question. I am fairly new to golang. Say I have a map like so:
map[int]string
How could I randomly split it into two maps or arrays and as close to even as possible? So for example, if there are 15 items, it would be split 7 - 8.
For example:
func split(m map[int]string) (odds map[int]string, evens map[int]string) {
n := 1
odds = make(map[int]string)
evens = make(map[int]string)
for key, value := range m {
if n % 2 == 0 {
evens[key] = value
} else {
odds[key] = value
}
n++
}
return odds, evens
}
It is actually an interesting example, because it shows a few aspects of Go that are not obvious for beginners:
range m iterates in a random order, unlike in any other language as far as I know,
the modulo operator % returns the remainder of the integer division,
a function can return several values.
You could do something like this:
myStrings := make(map[int]string)
// Values are added to myStrings
myStrings2 := make(map[int]string)
// Seed system time for random numbers
rand.Seed(time.Now().UTC().UnixNano())
for k, v := range myStrings {
if rand.Float32() < 0.5 {
myStrings2[k] = v
delete(myStrings, k)
}
}
https://play.golang.org/p/6OnH1k4FMu

Getting a slice of keys from a map

Is there any simpler/nicer way of getting a slice of keys from a map in Go?
Currently I am iterating over the map and copying the keys to a slice:
i := 0
keys := make([]int, len(mymap))
for k := range mymap {
keys[i] = k
i++
}
This is an old question, but here's my two cents. PeterSO's answer is slightly more concise, but slightly less efficient. You already know how big it's going to be so you don't even need to use append:
keys := make([]int, len(mymap))
i := 0
for k := range mymap {
keys[i] = k
i++
}
In most situations it probably won't make much of a difference, but it's not much more work, and in my tests (using a map with 1,000,000 random int64 keys and then generating the array of keys ten times with each method), it was about 20% faster to assign members of the array directly than to use append.
Although setting the capacity eliminates reallocations, append still has to do extra work to check if you've reached capacity on each append.
For example,
package main
func main() {
mymap := make(map[int]string)
keys := make([]int, 0, len(mymap))
for k := range mymap {
keys = append(keys, k)
}
}
To be efficient in Go, it's important to minimize memory allocations.
You also can take an array of keys with type []Value by method MapKeys of struct Value from package "reflect":
package main
import (
"fmt"
"reflect"
)
func main() {
abc := map[string]int{
"a": 1,
"b": 2,
"c": 3,
}
keys := reflect.ValueOf(abc).MapKeys()
fmt.Println(keys) // [a b c]
}
Go now has generics. You can get the keys of any map with maps.Keys.
Example usage:
intMap := map[int]int{1: 1, 2: 2}
intKeys := maps.Keys(intMap)
// intKeys is []int
fmt.Println(intKeys)
strMap := map[string]int{"alpha": 1, "bravo": 2}
strKeys := maps.Keys(strMap)
// strKeys is []string
fmt.Println(strKeys)
maps package is found in golang.org/x/exp/maps. This is experimental and outside of Go compatibility guarantee. They aim to move it into the std lib in Go 1.19 the future.
Playground: https://go.dev/play/p/fkm9PrJYTly
For those who don't like to import exp packages, you can copy the source code:
// Keys returns the keys of the map m.
// The keys will be an indeterminate order.
func Keys[M ~map[K]V, K comparable, V any](m M) []K {
r := make([]K, 0, len(m))
for k := range m {
r = append(r, k)
}
return r
}
I made a sketchy benchmark on the three methods described in other responses.
Obviously pre-allocating the slice before pulling the keys is faster than appending, but surprisingly, the reflect.ValueOf(m).MapKeys() method is significantly slower than the latter:
❯ go run scratch.go
populating
filling 100000000 slots
done in 56.630774791s
running prealloc
took: 9.989049786s
running append
took: 18.948676741s
running reflect
took: 25.50070649s
Here's the code: https://play.golang.org/p/Z8O6a2jyfTH
(running it in the playground aborts claiming that it takes too long, so, well, run it locally.)
A nicer way to do this would be to use append:
keys = []int{}
for k := range mymap {
keys = append(keys, k)
}
Other than that, you’re out of luck—Go isn’t a very expressive language.
A generic version (go 1.18+) of Vinay Pai's answer.
// MapKeysToSlice extract keys of map as slice,
func MapKeysToSlice[K comparable, V any](m map[K]V) []K {
keys := make([]K, len(m))
i := 0
for k := range m {
keys[i] = k
i++
}
return keys
}
Visit https://play.golang.org/p/dx6PTtuBXQW
package main
import (
"fmt"
"sort"
)
func main() {
mapEg := map[string]string{"c":"a","a":"c","b":"b"}
keys := make([]string, 0, len(mapEg))
for k := range mapEg {
keys = append(keys, k)
}
sort.Strings(keys)
fmt.Println(keys)
}
There is a cool lib called lo
A Lodash-style Go library based on Go 1.18+ Generics (map, filter, contains, find...)
With this lib you could do many convinient operations like map, filter, reduce and more. Also there are some helpers for map type
Keys
Creates an array of the map keys.
keys := lo.Keys[string, int](map[string]int{"foo": 1, "bar": 2})
// []string{"bar", "foo"}
Values
Creates an array of the map values.
values := lo.Values[string, int](map[string]int{"foo": 1, "bar": 2})
// []int{1, 2}

In Go how to get a slice of values from a map?

If I have a map m is there a better way of getting a slice of the values v than this?
package main
import (
"fmt"
)
func main() {
m := make(map[int]string)
m[1] = "a"
m[2] = "b"
m[3] = "c"
m[4] = "d"
// Can this be done better?
v := make([]string, len(m), len(m))
idx := 0
for _, value := range m {
v[idx] = value
idx++
}
fmt.Println(v)
}
Is there a built-in feature of a map? Is there a function in a Go package, or is this the only way to do this?
As an addition to jimt's post:
You may also use append rather than explicitly assigning the values to their indices:
m := make(map[int]string)
m[1] = "a"
m[2] = "b"
m[3] = "c"
m[4] = "d"
v := make([]string, 0, len(m))
for _, value := range m {
v = append(v, value)
}
Note that the length is zero (no elements present yet) but the capacity (allocated space) is initialized with the number of elements of m. This is done so append does not need to allocate memory each time the capacity of the slice v runs out.
You could also make the slice without the capacity value and let append allocate the memory for itself.
Unfortunately, no. There is no builtin way to do this.
As a side note, you can omit the capacity argument in your slice creation:
v := make([]string, len(m))
The capacity is implied to be the same as the length here.
Go 1.18
You can use maps.Values from the golang.org/x/exp package.
Values returns the values of the map m. The values will be in an indeterminate order.
func main() {
m := map[int]string{1: "a", 2: "b", 3: "c", 4: "d"}
v := maps.Values(m)
fmt.Println(v)
}
The package exp includes experimental code. The signatures may or may not change in the future, and may or may not be promoted to the standard library.
If you don't want to depend on an experimental package, you can easily implement it yourself. In fact, this code is a copy-paste from the exp package:
func Values[M ~map[K]V, K comparable, V any](m M) []V {
r := make([]V, 0, len(m))
for _, v := range m {
r = append(r, v)
}
return r
}
Not necessarily better, but the cleaner way to do this is by defining both the Slice LENGTH and CAPACITY like txs := make([]Tx, 0, len(txMap))
// Defines the Slice capacity to match the Map elements count
txs := make([]Tx, 0, len(txMap))
for _, tx := range txMap {
txs = append(txs, tx)
}
Full example:
package main
import (
"github.com/davecgh/go-spew/spew"
)
type Tx struct {
from string
to string
value uint64
}
func main() {
// Extra touch pre-defining the Map length to avoid reallocation
txMap := make(map[string]Tx, 3)
txMap["tx1"] = Tx{"andrej", "babayaga", 10}
txMap["tx2"] = Tx{"andrej", "babayaga", 20}
txMap["tx3"] = Tx{"andrej", "babayaga", 30}
txSlice := getTXsAsSlice(txMap)
spew.Dump(txSlice)
}
func getTXsAsSlice(txMap map[string]Tx) []Tx {
// Defines the Slice capacity to match the Map elements count
txs := make([]Tx, 0, len(txMap))
for _, tx := range txMap {
txs = append(txs, tx)
}
return txs
}
Simple solution but a lot of gotchas. Read this blog post for more details: https://web3.coach/golang-how-to-convert-map-to-slice-three-gotchas
As far as I'm currently aware, go doesn't have a way method for concatenation of strings/bytes in to a resulting string without making at least /two/ copies.
You currently have to grow a []byte since all string values are const, THEN you have to use the string builtin to have the language create a 'blessed' string object, which it will copy the buffer for since something somewhere could have a reference to the address backing the []byte.
If a []byte is suitable then you can gain a very slight lead over the bytes.Join function by making one allocation and doing the copy calls your self.
package main
import (
"fmt"
)
func main() {
m := make(map[int]string)
m[1] = "a" ; m[2] = "b" ; m[3] = "c" ; m[4] = "d"
ip := 0
/* If the elements of m are not all of fixed length you must use a method like this;
* in that case also consider:
* bytes.Join() and/or
* strings.Join()
* They are likely preferable for maintainability over small performance change.
for _, v := range m {
ip += len(v)
}
*/
ip = len(m) * 1 // length of elements in m
r := make([]byte, ip, ip)
ip = 0
for _, v := range m {
ip += copy(r[ip:], v)
}
// r (return value) is currently a []byte, it mostly differs from 'string'
// in that it can be grown and has a different default fmt method.
fmt.Printf("%s\n", r)
}
As of 1.18, this is the best way:
https://stackoverflow.com/a/71635953/130427
Pre 1.18
You can use this maps package:
go get https://github.com/drgrib/maps
Then all you have to call is
values := maps.GetValuesIntString(m)
It's type-safe for that common map combination. You can generate other type-safe functions for any other type of map using the mapper tool in the same package.
Full disclosure: I am the creator of this package. I created it because I found myself rewriting these functions for map repeatedly.

Resources