Race condition while writing and reading from the map - dictionary

Following up on old post here.
I am iterating over flatProduct.Catalogs slice and populating my productCatalog concurrent map in golang. I am using upsert method so that I can add only unique productID's into my productCatalog map.
Below code is called by multiple go routines in parallel that is why I am using concurrent map here to populate data into it. This code runs in background to populate data in the concurrent map every 30 seconds.
var productRows []ClientProduct
err = json.Unmarshal(byteSlice, &productRows)
if err != nil {
return err
}
for i := range productRows {
flatProduct, err := r.Convert(spn, productRows[i])
if err != nil {
return err
}
if flatProduct.StatusCode == definitions.DONE {
continue
}
r.products.Set(strconv.Itoa(flatProduct.ProductId, 10), flatProduct)
for _, catalogId := range flatProduct.Catalogs {
catalogValue := strconv.FormatInt(int64(catalogId), 10)
r.productCatalog.Upsert(catalogValue, flatProduct.ProductId, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
productID := newValue.(int64)
if valueInMap == nil {
return map[int64]struct{}{productID: {}}
}
oldIDs := valueInMap.(map[int64]struct{})
// value is irrelevant, no need to check if key exists
// I think problem is here
oldIDs[productID] = struct{}{}
return oldIDs
})
}
}
And below are my getters in the same class where above code is there. These getters are used by main application threads to get data from the map or get the whole map.
func (r *clientRepository) GetProductMap() *cmap.ConcurrentMap {
return r.products
}
func (r *clientRepository) GetProductCatalogMap() *cmap.ConcurrentMap {
return r.productCatalog
}
func (r *clientRepository) GetProductData(pid string) *definitions.FlatProduct {
pd, ok := r.products.Get(pid)
if ok {
return pd.(*definitions.FlatProduct)
}
return nil
}
This is how I am reading data from this productCatalog cmap but my system is crashing on the below range statement -
// get productCatalog map which was populated above
catalogProductMap := clientRepo.GetProductCatalogMap()
productIds, ok := catalogProductMap.Get("211")
data, _ := productIds.(map[int64]struct{})
// I get panic here after sometime
for _, pid := range data {
...
}
Error I am getting as - fatal error: concurrent map iteration and map write.
I think issue is r.productCatalog is a concurrentmap, but oldIDs[productID] is a normal map which is causing issues while I am iterating in the for loop above.
How can I fix this race issue I am seeing? One way I can think of is making oldIDs[productID] as concurrent map but if I do that approach then my memory increase by a lot and eventually goes OOM. Below is what I have tried which works and it solves the race condition but it increases the memory by a lot which is not what I want -
r.productCatalog.Upsert(catalogValue, flatProduct.ProductId, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
productID := newValue.(int64)
if valueInMap == nil {
// return map[int64]struct{}{productID: {}}
return cmap.New()
}
// oldIDs := valueInMap.(map[int64]struct{})
oldIDs := valueInMap.(cmap.ConcurrentMap)
// value is irrelevant, no need to check if key exists
// oldIDs[productID] = struct{}{}
oldIDs.Set(strconv.FormatInt(productID, 10), struct{}{})
return oldIDs
})
Any other approach I can do which doesn't increase memory and also fixes the race condition I am seeing?
Note
I am still using v1 version of cmap without generics and it deals with strings as keys.

Rather than a plain map[int64]struct{} type, you could define a struct which holds the map and a mutex to control the access to the map:
type myMap struct{
m sync.Mutex
data map[int64]struct{}
}
func (m *myMap) Add(productID int64) {
m.m.Lock()
defer m.m.Unlock()
m.data[productID] = struct{}{}
}
func (m *myMap) List() []int64 {
m.m.Lock()
defer m.m.Unlock()
var res []int64
for id := range m.data {
res = append(res, id)
}
// sort slice if you need
return res
}
With the sample implementation above, you would have to be careful to store *myMap pointers (as opposed to plain myMap structs) in your cmap.ConcurrentMap structure.

Related

How to avoid nested map allocations in my data structure?

I have a below struct where I have a nested map for CustomersIndex which allocates bunch of internal maps causing memory increase. I profiled it so I noticed this. I am trying to see if there is any way to redesign my CustomersIndex data structure which doesn't uses nested map?
const (
departmentsKey = "departments"
)
type CustomerManifest struct {
Customers []definitions.Customer
CustomersIndex map[int]map[int]definitions.Customer
}
This is the way it is being populated here in my below code:
func updateData(mdmCache *mdm.Cache) map[string]interface{} {
memCache := mdmCache.MemCache()
var customers []definitions.Customer
var customersIndex = map[int]map[int]definitions.Customer{}
for _, r := range memCache.Customer {
customer := definitions.Customer{
Id: int(r.Id),
SetId: int(r.DepartmentSetId),
}
customers = append(customers, customer)
_, yes := customersIndex[customer.SetId]
if !yes {
customersIndex[customer.SetId] = make(map[int]definitions.Customer)
}
customersIndex[customer.SetId][customer.Id] = customer
}
return map[string]interface{}{
departmentsKey: &CustomerManifest{Customers: customers, CustomersIndex: customersIndex},
}
}
And this is the way I am getting my CustomersIndex nested map.
func (c *Client) GetCustomerIndex() map[int]map[int]definitions.Customer {
c.mutex.RLock()
defer c.mutex.RUnlock()
customersIndex := c.data[departmentsKey].(*CustomerManifest).CustomersIndex
return customersIndex
}
Is there any way to design my CustomersIndex in a way where I don't have to use nested map?
You don't need to allocate a map until you put values in it.
type CustomerManifest struct {
Customers []definitions.Customer
CustomersIndex map[int]map[int]definitions.Customer
}
func (m *CustomerManifest) AddCustomerDefinition(x, y int, customer definitions.Customer) {
// Get the existing map, if exists.
innerMap := m.CustomersIndex[x]
// If it doesn't exist, allocate it.
if innerMap == nil {
innerMap = make(map[int]definitions.Customer)
m.CustomersIndex[x] = innerMap
}
// Add the value to the inner map, which now exists.
innerMap[y] = customer
}

How to upsert into concurrent map while iterating over regular map?

I need to have a map of string as key and unique (no dupes) int64 array as value so I decided to use something like below so that value can act as a set.
var customerCatalog = make(map[string]map[int64]bool)
Above map is populated with some data in it. Now I am trying to populate my concurrent map in golang by reading above regular customerCatalog map but I am getting error:
for k, v := range customerCatalog {
r.customerCatalog.Upsert(k, v, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
typedNewValue := newValue.([]int64)
if !exists {
return typedNewValue
}
typedValueInMap := valueInMap.([]int64)
return append(typedValueInMap, typedNewValue...)
})
}
This is the error I am getting. I am using upsert method as shown here
panic: interface conversion: interface {} is map[int64]bool, not []int64
What is wrong I am doing?
I believe a minimal, reproducible, example of your issue would be as follows (playground):
conMap := cmap.New()
v := map[int64]bool{}
updateItemFn := func(exist bool, valueInMap interface{}, newValue interface{}) interface{} {
_ = newValue.([]int64)
return nil
}
conMap.Upsert("foo", v, updateItemFn)
Note: I have stripped out the loop etc because that is irrelevant to the panic. However you should note that because the loop iterates over a map[string]map[int64]boolthe type of v will be map[int64]bool.
The Upsert function looks up the key in the map and then passes it and the value you passed in to the function.
So your function is receiving a map[int64]bool and the first thing it does is to assert that this is a []int64 (which will fail because it's not). To fix this you need to convert the map[int64]bool into a []int64. This could be done before calling the Upsert or within your implementation of UpsertCb as shown here (playground):
conMap := cmap.New()
conMap.Set("foo", []int64{5, 6})
v := map[int64]bool{
1: true,
}
updateItemFn := func(exist bool, valueInMap interface{}, newValue interface{}) interface{} {
m := newValue.(map[int64]bool)
a := make([]int64, 0, len(m))
for k := range m {
a = append(a, k)
}
if valueInMap == nil { // New value!
return a
} else {
typedValueInMap := valueInMap.([]int64)
return append(typedValueInMap, a...)
}
return a
}
conMap.Upsert("foo", v, updateItemFn)
fmt.Println(conMap.Get("foo"))
The above has been kept simple to demonstrate the point; in reality you may want to add all of the values into a map so as to avoid duplicates.

Wrapping a pointer in Go

A library foo exposes a type A and a function Fn in that library returns a *A.
I have defined a "wrapper" for A called B:
type B foo.A
Can I convert the *A to a *B without dereferencing the A?
In other words, if I have
a := foo.Fn() // a is a *A
b := B(*a)
return &b
How can I convert the *a to a *b without using *a?
The reason that I ask is that in the library that I am using, github.com/coreos/bbolt, the *DB value returned from the Open function includes a sync.Mutex and so the compiler complains when I try to make a copy of the Mutex.
UPDATE TO EXPLAIN HOW I'LL USE THIS
I have a
type Datastore struct {
*bolt.DB
}
I also have a function (one of many) like this:
func (ds *Datastore) ReadOne(bucket, id string, data interface{}) error {
return ds.View(func(tx *bolt.Tx) error {
b, err := tx.CreateBucketIfNotExists([]byte(bucket))
if err != nil {
return fmt.Errorf("opening bucket %s: %v", bucket, err)
}
bytes := b.Get([]byte(id))
if bytes == nil {
return fmt.Errorf("id %s not found", id)
}
if err := json.Unmarshal(bytes, data); err != nil {
return fmt.Errorf("unmarshalling item: %v", err)
}
return nil
})
}
I would like to mock the underlying BoltDB database using a hash map. I ran into a problem mocking this because of the View expecting a function that takes bolt.Tx. That tx is then used to create a new bucket in CreateBucketIfNotExists. I cannot replace that anonymous function argument with one that calls my hash map mock version of CreateBucketIfNotExists.
I came up with this:
package boltdb
import (
"github.com/coreos/bbolt"
)
type (
bucket bolt.Bucket
// Bucket is a wrapper for bolt.Bucket to facilitate mocking.
Bucket interface {
ForEach(fn func([]byte, []byte) error) error
Get(key []byte) []byte
NextSequence() (uint64, error)
Put(key, value []byte) error
}
db bolt.DB
// DB is a wrapper for bolt.DB to facilitate mocking.
DB interface {
Close() error
Update(fn func(*Tx) error) error
View(fn func(*Tx) error) error
}
transaction bolt.Tx
// Tx is a wrapper for bolt.Tx to facilitate mocking.
Tx interface {
CreateBucketIfNotExists(name []byte) (Bucket, error)
}
)
// ForEach executes a function for each key/value pair in a bucket.
func (b *bucket) ForEach(fn func([]byte, []byte) error) error {
return ((*bolt.Bucket)(b)).ForEach(fn)
}
// Get retrieves the value for a key in the bucket.
func (b *bucket) Get(key []byte) []byte {
return ((*bolt.Bucket)(b)).Get(key)
}
// NextSequence returns an autoincrementing integer for the bucket.
func (b *bucket) NextSequence() (uint64, error) {
return ((*bolt.Bucket)(b)).NextSequence()
}
// Put sets the value for a key in the bucket.
func (b *bucket) Put(key, value []byte) error {
return ((*bolt.Bucket)(b)).Put(key, value)
}
// Close releases all database resources.
func (db *db) Close() error {
return ((*bolt.DB)(db)).Close()
}
// Update executes a function within the context of a read-write managed transaction.
func (db *db) Update(fn func(Tx) error) error {
return ((*bolt.DB)(db)).Update(func(tx *bolt.Tx) error {
t := transaction(*tx)
return fn(&t)
})
}
// View executes a function within the context of a managed read-only transaction.
func (db *db) View(fn func(Tx) error) error {
return ((*bolt.DB)(db)).View(func(tx *bolt.Tx) error {
t := transaction(*tx)
return fn(&t)
})
}
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist.
func (tx *transaction) CreateBucketIfNotExists(name []byte) (Bucket, error) {
b, err := ((*bolt.Tx)(tx)).CreateBucketIfNotExists(name)
if err != nil {
return nil, err
}
w := bucket(*b)
return &w, nil
}
So far, in my code, I am only using the functions shown above. I can add more if new code requires.
I will replace each bolt.DB with DB, bolt.Tx with Tx, and bolt.Bucket with Bucket in the real code. The mocker will use replacements for all three types that use the underlying hash map instead of storing to disk. I can then test all of my code, right down to the database calls.
You can simply / directly convert a value of type *A to a value of type *B, you just have to parenthesize *B:
a := foo.Fn() // a is a *A
b := (*B)(a)
return b
You can even convert the return value of the function call:
return (*B)(foo.Fn())
Try it on the Go Playground.
This is possible, because Spec: Conversions:
A non-constant value x can be converted to type T in any of these cases:
x is assignable to T.
...
And Spec: Assignability:
A value x is assignable to a variable of type T ("x is assignable to T") if one of the following conditions applies:
...
x's type V and T have identical underlying types and at least one of V or T is not a defined type.
Both *B and *A types are not defined, and the underlying type of *B is the same as the underlying type of *A (which is the pointer to the underlying type of whatever type there is in the type declaration of A).

Passing values to interface{}

Short
The following code does not exactly do what expected:
https://play.golang.org/p/sO4w4I_Lle
I assume that I mess up some pointer/reference stuff as usual, however I expect my...
func unmarshalJSON(in []byte, s interface{}) error
... and encoding/jsons...
func Unmarshal(data []byte, v interface{}) error
...to behave the same way (eg. update the referenced passed as second argument).
Long
The example above is a minimal reproducer that does not make much sense. This is in order to make it work on the playground. However, an less minimal example that does make sense is this:
package main
import (
"fmt"
"gopkg.in/yaml.v2"
)
func unmarshalYAML(in []byte, s interface{}) error {
var result map[interface{}]interface{}
err := yaml.Unmarshal(in, &result)
s = cleanUpInterfaceMap(result)
// s is printed as expected
fmt.Println(s) // map[aoeu:[test aoeu] oaeu:[map[mahl:aoec tase:aoeu]]]
return err
}
func cleanUpInterfaceArray(in []interface{}) []interface{} {
out := make([]interface{}, len(in))
for i, v := range in {
out[i] = cleanUpMapValue(v)
}
return out
}
func cleanUpInterfaceMap(in map[interface{}]interface{}) map[string]interface{} {
out := make(map[string]interface{})
for k, v := range in {
out[fmt.Sprintf("%v", k)] = cleanUpMapValue(v)
}
return out
}
func cleanUpMapValue(v interface{}) interface{} {
switch v := v.(type) {
case []interface{}:
return cleanUpInterfaceArray(v)
case map[interface{}]interface{}:
return cleanUpInterfaceMap(v)
case string:
return v
default:
return fmt.Sprintf("%v", v)
}
}
func main() {
s := make(map[string]interface{})
b := []byte(`---
aoeu:
- test
- aoeu
oaeu:
- { tase: aoeu, mahl: aoec}
`)
err := unmarshalYAML(b, &s)
if err != nil {
panic(err)
}
// s is still an empty map
fmt.Println(s) // map[]
}
The idea is to unmarshal YAML to map[string]interface{} (instead of map[interface{}]interface{}) is order to allow to serialize to JSON (where identifiers need to be strings). The unmarshalYAML function should provide the same func signture as yaml.Unmarshal...
Using Type assertion
Inside your unmarshalJSON() function the parameter s behaves like a local variable. When you assign something to it:
s = result
It will only change the value of the local variable.
Since you want it to work with changing the value of a *map[string]interface{} and that is what you pass to it, you could use a simple type assertion to obtain the map pointer from it, and pass this pointer to json.Unmarshal():
func unmarshalJSON(in []byte, s interface{}) error {
if m, ok := s.(*map[string]interface{}); !ok {
return errors.New("Expecting *map[string]interface{}")
} else {
return json.Unmarshal(in, m)
}
}
Try your modified, working example on the Go Playground.
Just passing it along
Also note that however this is completely unnecessary as json.Unmarshal() is also defined to take the destination as a value of type interface{}, the same thing you have. So you don't even have to do anything just pass it along:
func unmarshalJSON(in []byte, s interface{}) error {
return json.Unmarshal(in, s)
}
Try this on the Go Playground.
With a variable of function type
As an interesting thing note that the signature of your unmarshalJSON() and the library function json.Unmarshal() is identical:
// Yours:
func unmarshalJSON(in []byte, s interface{}) error
// json package
func Unmarshal(data []byte, v interface{}) error
This means there is another option, that is you could use a variable named unmarshalJSON of a function type, and just simply assign the function value json.Unmarshal:
var unmarshalJSON func([]byte, interface{}) error = json.Unmarshal
Now you have a variable unmarshalJSON which is of function type, and you can call it as if it would be a function:
err := unmarshalJSON(b, &s)
Try this function value on the Go Playground.
Now on to your unmarshalYAML() function
In your unmarshalYAML() you do the same mistake:
s = cleanUpInterfaceMap(result)
This will only change the value of your local s variable (parameter), and it will not "populate" the map (pointer) passed to unmarshalYAML().
Use the type assertion technique detailed above to obtain the pointer from the s interface{} argument, and once you have that, you can change the pointed object (the "outside" map).
func unmarshalYAML(in []byte, s interface{}) error {
var dest *map[string]interface{}
var ok bool
if dest, ok = s.(*map[string]interface{}); !ok {
return errors.New("Expecting *map[string]interface{}")
}
var result map[interface{}]interface{}
if err := yaml.Unmarshal(in, &result); err != nil {
return err
}
m := cleanUpInterfaceMap(result)
// m holds the results, dest is the pointer that was passed to us,
// we can just set the pointed object (map):
*dest = m
return nil
}

How to use global var across files in a package?

I have the following file structure:
models/db.go
type DB struct {
*sql.DB
}
var db *DB
func init() {
dbinfo := fmt.Sprintf("user=%s password=%s dbname=%s sslmode=disable",
DB_USER, DB_PASSWORD, DB_NAME)
db, err := NewDB(dbinfo)
checkErr(err)
rows, err := db.Query("SELECT * FROM profile")
checkErr(err)
fmt.Println(rows)
}
func NewDB(dataSourceName string) (*DB, error) {
db, err := sql.Open("postgres", dataSourceName)
if err != nil {
return nil, err
}
if err = db.Ping(); err != nil {
return nil, err
}
return &DB{db}, nil
}
models/db_util.go
func (p *Profile) InsertProfile() {
if db != nil {
_, err := db.Exec(...)
checkErr(err)
} else {
fmt.Println("DB object is NULL")
}
}
When I try to access db in InsertProfile function, it says NULL ptr exception. How do I access the db in db_utils.go?
I would not like to capitalize db (as it would give access to all the packages).
I am getting the QUERY returned from the db in init() correctly.
Edit: The problem is that you used Short variable declaration := and you just stored the created *DB value in a local variable and not in the global one.
This line:
db, err := NewDB(dbinfo)
Creates 2 local variables: db and err, and this local db has nothing to do with your global db variable. Your global variable will remain nil. You have to assign the created *DB to the global variable. Do not use short variable declaration but simple assignment, e.g:
var err error
db, err = NewDB(dbinfo)
if err != nil {
log.Fatal(err)
}
Original answer follows.
It's a pointer type, you have to initialize it before you use it. The zero value for pointer types is nil.
You don't have to export it (that's what starting it with a capital letter does). Note that it doesn't matter that you have multiple files as long as they are part of the same package, they can access identifiers defined in one another.
A good solution would be to do it in the package init() function which is called automatically.
Note that sql.Open() may just validate its arguments without creating a connection to the database. To verify that the data source name is valid, call DB.Ping().
For example:
var db *sql.DB
func init() {
var err error
db, err = sql.Open("yourdrivername", "somesource")
if err != nil {
log.Fatal(err)
}
if err = db.Ping(); err != nil {
log.Fatal(err)
}
}
icza has already correctly answered your specific problem but it's worth adding some additional explanation on what you're doing wrong so you understand how not to make the mistake in the future. In Go, the syntax := for assignment creates new variables with the names to the left of the :=, possibly shadowing package, or even parent scope function/method variables. As an example:
package main
import "fmt"
var foo string = "global"
func main() {
fmt.Println(foo) // prints "global"
// using := creates a new function scope variable
// named foo that shadows the package scope foo
foo := "function scope"
fmt.Println(foo) // prints "function scope"
printGlobalFoo() // prints "global"
if true {
foo := "nested scope"
fmt.Println(foo) // prints "nested scope"
printGlobalFoo() // prints "global"
}
// the foo created inside the if goes out of scope when
// the code block is exited
fmt.Println(foo) // prints "function scope"
printGlobalFoo() // prints "global"
if true {
foo = "nested scope" // note just = not :=
}
fmt.Println(foo) // prints "nested scope"
printGlobalFoo() // prints "global"
setGlobalFoo()
printGlobalFoo() // prints "new value"
}
func printGlobalFoo() {
fmt.Println(foo)
}
func setGlobalFoo() {
foo = "new value" // note just = not :=
}
Note Go has no way to delete or unset a variable, so once you have shadowed a higher scope variables (such as by creating a function scope variable of the same name as a package scope variable), there is no way to access the higher scope variable within that code block.
Also be aware that := is a shorthand for var foo =. Both act in exactly the same way, however := is only valid syntax within a function or method, while the var syntax is valid everywhere.
For who came here and wants a fast answer.
in db.go file:
package db
var db *DB
type DB struct {
*gorm.DB // or what database you want like *mongo.Client
}
func GetDB() *DB {
if db == nil{
db = ConnectToYourDbFunc("connection_string")
}
return db
}
then in your other packages you can get it just with this:
db := db.GetDB()
thats all.

Resources