Go determine number of word occurences on a string slice - dictionary

Having a hard time trying to figure out how can I count the number of apps or words on a slice using the go-lang code I made.
Hoping someone could help me figure out how to count the number of occurence?
https://play.golang.org/p/KvgI-lCz_c6
package main
import (
"fmt"
)
func main() {
apps := []string{"one", "two", "three", "one", "four"}
fmt.Println("apps:", apps)
o := CountOccurence(apps)
fmt.Println("=== o: ", o)
}
func CountOccurence(apps []string) map[string]int {
dict := make(map[string]int)
for k, v := range apps {
fmt.Println(k, v)
dict[v] = k
}
// fmt.Println("=== dict: ", dict)
return dict
}
Outputs the following
apps: [one two three one four]
0 one
1 two
2 three
3 one
4 four
=== o: map[four:4 one:3 three:2 two:1]
PS: go strings.Count only counts a string, not a []string.

What you currently do is you gather the different elements and you assign their index to them. If a word occurs multiple times, the highest index will be assigned to it.
As you stated, you want to count the words. So instead of the index, assign 1 for new words (first occurrence), and if it's already in the map, increment its value by 1.
Since you can index a map with a non-existing key, in which case the result is the zero value of the value type of the map, which is 0 for int, it will tell you it was found 0 times (so far), so you don't even have to check if a key is already in there, just go ahead and increment it:
dict[v]++
So CountOccurrences() may look like this:
func CountOccurence(apps []string) map[string]int {
dict := make(map[string]int)
for _, v := range apps {
fmt.Println(v)
dict[v]++
}
return dict
}
Which will output (try it on the Go Playground):
apps: [one two three one four]
one
two
three
one
four
=== o: map[four:1 one:2 three:1 two:1]

Related

How to slice a map[string]int into chunks

My objective is to take a map[string]int containing potentially up to a million entries and chunk it in sizes of up to 500 and POST the map to an external service. I'm newer to golang, so I'm tinkering in the Go Playground for now.
Any tips anyone has on how to improve the efficiency of my code base, please share!
Playground: https://play.golang.org/p/eJ4_Pd9X91c
The CLI output I'm seeing is:
original size 60
chunk bookends 0 20
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
chunk bookends 20 40
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
chunk bookends 40 60
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
The problem here is that while the chunk bookends are being calculated correctly, the x value is starting at 0 each time. I think I should expect it to start at the chunk bookend minimum, which would be 0, 20, 40, etc. How come the range is starting at zero each time?
Source:
package main
import (
"fmt"
"math/rand"
"strconv"
)
func main() {
items := make(map[string]int)
// Generate some fake data for our testing, in reality this could be 1m entries
for i := 0; i < 60; i ++ {
// int as strings are intentional here
items[strconv.FormatInt(int64(rand.Int()), 10)] = rand.Int()
}
// Create a map of just keys so we can easily chunk based on the numeric keys
i := 0
keys := make([]string, len(items))
for k := range items {
keys[i] = k
i++
}
fmt.Println("original size", len(keys))
//batchContents := make(map[string]int)
// Iterate numbers in the size batch we're looking for
chunkSize := 20
for chunkStart := 0; chunkStart < len(keys); chunkStart += chunkSize {
chunkEnd := chunkStart + chunkSize
if chunkEnd > len(items) {
chunkEnd = len(items)
}
// Iterate over the keys
fmt.Println("chunk bookends", chunkStart, chunkEnd)
for x := range keys[chunkStart:chunkEnd] {
fmt.Print(x, ",")
// Build the batch contents with the contents needed from items
// #todo is there a more efficient approach?
//batchContents[keys[i]] = items[keys[i]]
}
fmt.Println()
// #todo POST final batch contents
//fmt.Println(batchContents)
}
}
When you process a chunk:
for x := range keys[chunkStart:chunkEnd] {}
You are iterating over a slice, and having one iteration variable, it will be the slice index, not the element from the slice (at the given index). Hence it will always start at 0. (When you iterate over a map, first iteration variable is the key because there is no index there, and the second is the value associated with that key.)
Instead you want this:
for _, key := range keys[chunkStart:chunkEnd] {}
Also note that it's redundant to first collect the keys in a slice, and then process them. You may do that when iterating over the map once, at first. Just keep a variable counting the iterations to know when you reach the chunk size, which may be implicit if you use data structures that keeps this (e.g. the size of a keys batch slice).
For example (try it on the Go Playground):
chunkSize := 20
batchKeys := make([]string, 0, chunkSize)
process := func() {
fmt.Println("Batch keys:", batchKeys)
batchKeys = batchKeys[:0]
}
for k := range items {
batchKeys = append(batchKeys, k)
if len(batchKeys) == chunkSize {
process()
}
}
// Process last, potentially incomplete batch
if len(batchKeys) > 0 {
process()
}

How to convert a map to a slice of entries?

I'm trying to convert key-value map to slice of pairs, for example given a map like:
m := make(map[int64]int64)
m[521] = 4
m[528] = 8
How do I convert that into a slice of its entries, like: [[521, 4], [528, 8]]
I'm thinking about ranging over all those key-values then create slice for that, but is there any simple code to do that?
package main
import "fmt"
func main() {
//create a map
m := map[int64]int64{512: 8, 513: 9, 234: 9, 392: 0}
//create a slice to hold required values
s := make([][]int64, 0)
//range over map `m` to append to slice `s`
for k, v := range m {
// append each element, with a new slice []int64{k, v}
s = append(s, []int64{k, v})
}
fmt.Println(s)
}
Go 1.18
It is now possible to write a generic function to extract all key-value pairs, i.e. the map entries, with any key and value types.
Notes:
the map iterations are still unordered — using generics doesn't change that.
the constraint for the map key must be comparable
type Pair[K, V any] struct {
First K
Second V
}
func Entries[M ~map[K]V, K comparable, V any](m M) []Pair[K, V] {
entries := make([]Pair[K, V], 0)
for k, v := range m {
entries = append(entries, Pair[K, V]{k, v})
}
return entries
}
The type Pair here is used to preserve type safety in the return value. If you really must return a slice of slices, then it can only be [][]any (or [][2]any) in order to hold different types.
If the map key and value have the same type, of course you can still use Pair but you can also use a type-safe variation of the above:
func Entries[T comparable](m map[T]T) [][2]T {
entries := make([][2]T, 0)
for k, v := range m {
entries = append(entries, [2]T{k, v})
}
return entries
}
Again, T must be comparable or stricter in order to work as a map key.
Playground: https://go.dev/play/p/RwCGmp7MHKW

Dynamic generation of subtour elimination constraints in AMPL for a PVRP

I am trying to code a Periodic Vehicle Routing Problem with some inventory constraints in AMPL. I would like to add the subtour constraints dynamically. In order to do this i was inspired by this formulation for a TSP:
https://groups.google.com/d/msg/ampl/mVsFg4mAI1c/ZdfRHHRijfUJ
However, I can not get it to eliminate subtours in my model. I used the following in my model file.
param T; # Number of time-periods
param V; # Number of vehicles
param F; # Number of fuel types
set P ordered; # Number of gas stations
param hpos {P} >= 0;
param vpos {P} >= 0;
set PAIRS := {p in P, j in P};
param dist {(p,j) in PAIRS}
:= sqrt((hpos[j]-hpos[p])**2 + (vpos[j]-vpos[p])**2);
# A binary variable to determine if an arc is traversed.
var H{(p,j) in PAIRS, v in 1..V, t in 1..T} binary;
# A binary variable to determine if a delivery of fuel is made to a station in a given time period.
var StationUsed{p in P, f in 1..F, v in 1..V, t in 1..T} binary;
minimize TransportationCost:
sum {(p,j) in PAIRS} sum {v in 1..V, t in 1..T} dist[p,j] * H[p,j,v,t];
param nSubtours >= 0 integer;
set SUB {1..nSubtours} within P;
subject to Subtour_Elimination {k in 1..nSubtours, m in SUB[k], v in 1..V, t in 1..T, f in 1..F}:
sum {p in SUB[k], j in P diff SUB[k]}
if (p,j) in PAIRS then H[p,j,v,t] else H[j,p,v,t] >=2 * StationUsed[m,f,v,t] ;
I added the StationUsed variable, as my problem unlike TSP does not have to visit all nodes in every timeperiod. H is my binary decision variable declaring if vehicle travels the arc (p,j) in a time period.
Then I used a formulation similar to the TSP in my run file:
set NEWSUB;
set EXTEND;
let nSubtours := 0;
repeat {
solve;
let NEWSUB := {};
let EXTEND := {member(ceil(Uniform(0,card(P))),P)};
repeat {
let NEWSUB := NEWSUB union EXTEND;
let EXTEND := {j in P diff NEWSUB: exists {p in NEWSUB, v in 1..V, t in 1..T}
((p,j) in PAIRS and H[p,j,v,t] = 1 or (j,p) in PAIRS and H[j,p,v,t] = 1)};
} until card(EXTEND) = 0;
if card(NEWSUB) < card(P) then {
let nSubtours := nSubtours + 1;
let SUB[nSubtours] := NEWSUB;
display SUB;
} else break;
};
# Display the routes
display {t in 1..T, v in 1..V}: {(p,j) in PAIRS} H[p,j,v,t];
I am not sure if the above is applicable to my problem with multiple vehicles and multiple time periods. I have tried defining v and t in let EXTEND, at it is needed to use H, but I am not sure if this is a correct method. My models runs, when formulated as above, however it does not eliminate the subtours. Do you guys have any suggestions in this regard?
ADDED QUESTION:
I found some inspiration in this model formulated in SAS/OR:
(A bit extensive to read and not necessary for my questions)
http://support.sas.com/documentation/cdl/en/ormpex/67518/HTML/default/viewer.htm#ormpex_ex23_sect009.htm
It eliminates subtours dynamically over d days and I figured it could be translated to my problem with multiple vehicles and multiple periods (days).
To specify my problem a little. A node can only be visited by one vehicle once within a time period. All nodes does not have to be visited in every time period, which is a major difference from the TSP formulation, where all nodes are in the cycle.
I tried with the following approach:
The constraint in the model file is the same as before.
set P ordered; # Number of nodes
set PAIRS := {p in P, j in P: ord(p) != ord(j)};
param nSubtours >= 0 integer;
param iter >= 0 integer;
set SUB {1..nSubtours} within P;
subject to Subtour_Elimination {s in 1..nSubtours, k in SUB[s], f in F, v in V, t in T}:
sum {p in SUB[s], j in P diff SUB[s]}
if (p,j) in PAIRS then H[p,j,v,t] else H[j,p,v,t] >= 2 * StationUsed[k,f,v,t];
My run file looks like this:
let nSubtours := 0;
let iter := 0;
param num_components {V, T};
set P_TEMP;
set PAIRS_SOL {1..iter, V, T} within PAIRS;
param component_id {P_TEMP};
set COMPONENT_IDS;
set COMPONENT {COMPONENT_IDS};
param cp;
param cj;
# loop until each day and each vehicles support graph is connected
repeat {
let iter := iter + 1;
solve;
# Find connected components for each day
for {v in V, t in T} {
let P_TEMP := {p in P: exists {f in F} StationUsed[p,f,v,t] > 0.5};
let PAIRS_SOL[iter, v, t] := {(p,j) in PAIRS: H[p, j, v, t] > 0.5};
# Set each node to its own component
let COMPONENT_IDS := P_TEMP;
let num_components[v, t] := card(P_TEMP);
for {p in P_TEMP} {
let component_id[p] := p;
let COMPONENT[p] := {p};
};
# If p and j are in different components, merge the two component
for {(p,j) in PAIRS_SOL[iter, v, t]} {
let cp := component_id[p];
let cj := component_id[j];
if cp != cj then {
# update smaller component
if card(COMPONENT[cp]) < card(COMPONENT[cj]) then {
for {k in COMPONENT[cp]} let component_id[k] := cj;
let COMPONENT[cj] := COMPONENT[cj] union COMPONENT[cp];
let COMPONENT_IDS := COMPONENT_IDS diff {cp};
} else {
for {k in COMPONENT[cj]} let component_id[k] := cp;
let COMPONENT[cp] := COMPONENT[cp] union COMPONENT[cj];
let COMPONENT_IDS := COMPONENT_IDS diff {cj};
};
};
};
let num_components[v, t] := card(COMPONENT_IDS);
display num_components[v, t];
# create subtour from each component not containing depot node
for {k in COMPONENT_IDS: 1 not in COMPONENT[k]} { . #***
let nSubtours := nSubtours + 1;
let SUB[nSubtours] := COMPONENT[k];
display SUB[nSubtours];
};
};
display num_components;
} until (forall {v in V, t in T} num_components[v,t] = 1);
I get a lot of "invalid subscript discarded", when running the model:
Error at _cmdno 43 executing "if" command
(file amplin, line 229, offset 5372):
error processing set COMPONENT:
invalid subscript COMPONENT[4] discarded.
Error at _cmdno 63 executing "for" command
(file amplin, line 245, offset 5951):
error processing set COMPONENT:
invalid subscript COMPONENT[3] discarded.
(...)
Bailing out after 10 warnings.
I think the script is doing what I am looking for, but it stops, when it has discarded 10 invalid subscripts.
When trying to debug I tested the second for loop.
for {p in P_TEMP} {
let component_id[p] := p;
let COMPONENT[p] := {p};
display component_id[p];
display COMPONENT[p];
};
This is displaying correct, but not before a few errors with "invalid subscript discarded". It seems that p runs through some p not in P_TEMP. For example when P_TEMP is a set consisting of nodes "1 3 4 5", then I get "invalid subscript discarded" for component_id[2] and COMPONENT[2]. My guess is that something similar happens again later on in the IF-ELSE statement.
How do I avoid this?
Thank you,
Kristian
(previous answer text deleted because I misunderstood the implementation)
I'm not sure if this fully explains your issue, but I think there are a couple of problems with how you're identifying subtours.
repeat {
solve;
let NEWSUB := {};
let EXTEND := {member(ceil(Uniform(0,card(P))),P)};
repeat {
let NEWSUB := NEWSUB union EXTEND;
let EXTEND := {j in P diff NEWSUB: exists {p in NEWSUB, v in 1..V, t in 1..T}
((p,j) in PAIRS and H[p,j,v,t] = 1 or (j,p) in PAIRS and H[j,p,v,t] = 1)};
} until card(EXTEND) = 0;
if card(NEWSUB) < card(P) then {
let nSubtours := nSubtours + 1;
let SUB[nSubtours] := NEWSUB;
display SUB;
} else break;
};
What this does:
solves the problem
sets NEWSUB as empty
randomly picks one node from P as the starting point for EXTEND and adds this to NEWSUB
looks for any nodes not currently in NEWSUB which are connected to a node within NEWSUB by any vehicle journey on any day, and adds them to NEWSUB
repeats this process until there are no more to add (i.e. either NEWSUB equals P, the entire set of nodes, or until there are no journeys between NEWSUB and non-NEWSUB notedes)
checks whether NEWSUB is smaller than P (in which case it identifies NEWSUB as a new subtour, appends it to SUB, and goes back to the start).
if NEWSUB has the same size as P (i.e. is equal to P) then it stops.
This should work for a single-vehicle problem with only a single day, but I don't think it's going to work for your problem. There are two reasons for this:
If your solution has different subtours on different days, it may not recognise them as subtours.
For example, consider a single-vehicle problem with two days, where your cities are A, B, C, D, E, F.
Suppose that the day 1 solution selects AB, BC, CD, DE, EF, FA, and the day 2 solution selects AB, BC, CA, DE, EF, FD. Day 1 has no subtour, but day 2 has two length-3 subtours, so this should not be a legal solution.
However, your implementation won't identify this. No matter which node you select as the starting point for NEWSUB, the day-1 routes connect it to all other nodes, so you end up with card(NEWSUB) = card(P). It doesn't notice that Day 2 has a subtour so it will accept this solution.
I'm not sure whether your problem allows for multiple vehicles to visit the same node on the same day. If it does, then you're going to run into the same sort of problem there, where a subtour for vehicle 1 isn't identified because vehicle 2 links that subtour to the rest of P.
Some of this could be fixed by doing subtour checking separately for each day and for each vehicle. But for the problem as you've described it, there's another issue...
Once the program has identified a closed route (i.e. a set of nodes that are all linked to one another, and not to any other nodes) then it needs to figure out whether this subtour should be prohibited.
For the basic TSP, this is straightforward. We have one vehicle that needs to visit every node - hence, if the cardinality of the subtour is smaller than the cardinality of all nodes, then we have an illegal subtour. This is handled by if card(NEWSUB) < card(P).
However, you state:
my problem unlike TSP does not have to visit all nodes in every timeperiod
Suppose Vehicle 1 travels A-B-C-A and Vehicle 2 travels D-E-F-D. In this case, these routes will look like illegal subtours because ABC and DEF are each smaller than ABCDEF and there are no routes that link them. If you use if card(NEWSUB) < card(P) as your criterion for a subloop that should be forbidden, you'll end up forcing every vehicle to visit all nodes, which is fine for basic TSP but not what you want here.
This one can be fixed by identifying how many nodes vehicle v visits on day t, and then comparing the length of the subtour to that total: e.g. if there are 10 cities total, vehicle 1 only visits 6 of them on day 1, and a "subtour" for vehicle 1 visits 6 cities, then that's fine, but if it visits 8 and has a subtour that visits 6, that implies it's travelling two disjoint subloops, which is bad.
One trap to watch out for here:
Suppose Day 1 requires vehicle 1 to visit ABCDEF. If we get a "solution" that has vehicle 1 ABCA and DEFD on one day, we might identify ABCA as a subtour that should be prevented.
However, if Day 2 has different requirements, it might be that having vehicle 1 travel ABCA (and no other nodes) is a legitimate solution for day 2. In this case, you don't want to forbid it on day 2 just because it was part of an illegal solution for day 1.
Similarly, you might have a "subroute" that is a legal solution for one vehicle but illegal for another.
To avoid this, you might need to maintain a different list of prohibited subroutes for each vehicle x day, instead of using one list for all. Unfortunately this is going to make your implementation a bit more complex.

How to randomly split a map in Go as evenly as possible?

I have a quick question. I am fairly new to golang. Say I have a map like so:
map[int]string
How could I randomly split it into two maps or arrays and as close to even as possible? So for example, if there are 15 items, it would be split 7 - 8.
For example:
func split(m map[int]string) (odds map[int]string, evens map[int]string) {
n := 1
odds = make(map[int]string)
evens = make(map[int]string)
for key, value := range m {
if n % 2 == 0 {
evens[key] = value
} else {
odds[key] = value
}
n++
}
return odds, evens
}
It is actually an interesting example, because it shows a few aspects of Go that are not obvious for beginners:
range m iterates in a random order, unlike in any other language as far as I know,
the modulo operator % returns the remainder of the integer division,
a function can return several values.
You could do something like this:
myStrings := make(map[int]string)
// Values are added to myStrings
myStrings2 := make(map[int]string)
// Seed system time for random numbers
rand.Seed(time.Now().UTC().UnixNano())
for k, v := range myStrings {
if rand.Float32() < 0.5 {
myStrings2[k] = v
delete(myStrings, k)
}
}
https://play.golang.org/p/6OnH1k4FMu

Very Confusing variable changes

http://play.golang.org/p/Vd3meom5VF
I have this code for some context free grammar in Go
And I am looking at this code so many times and still don't see any reason for the struct values to be changed. Could anybody see why the change like the following happens?
Rules:
S -> . [DP VP]
VP -> . [V DP]
VP -> . [V DP AdvP]
After I run some functions as in the line
or2 = append(or2, OstarCF([]QRS{q}, []string{"sees"}, g2.Nullables(), g2.ChainsTo(g2.Nullables()))...)
Somehow my struct value is changed... I don't know why...
Rules:
S -> . [VP VP]
VP -> . [DP DP]
VP -> . [AdvP AdvP AdvP]
This should have been same as above.
Rules:
S -> DP,VP
VP -> V,DP
VP -> V,DP,AdvP
or2 := []QRS{}
g2 := ToGrammar(cfg2)
fmt.Printf("%s\n", g2)
for _, rule := range g2.Rules {
q := QRS{
one: rule.Src,
two: []string{},
three: rule.Right,
}
or2 = append(or2, OstarCF([]QRS{q}, []string{"sees"}, g2.Nullables(), g2.ChainsTo(g2.Nullables()))...)
}
fmt.Printf("%s\n", g2)
As you see, I do not use any pointer the variable rule, and they are only used to instantiate another struct value, but how come the original struct field rule has changed? The function OstarCF does not do anything about this field rule
func OstarCF(Qs []QRS, R []string, nD map[string]bool, cD map[string][]string) []QRS {
symbols := []string{}
for _, r := range R {
symbols = append(symbols, cD[r]...)
}
product := []QRS{}
for _, Q := range Qs {
a := Q.one
b := Q.two
c := Q.three
if len(c) > 0 && CheckStr(c[0], symbols) {
b = append(b, c[0])
np := QRS{
one: a,
two: b,
three: c[1:],
}
product = append(product, np)
for len(np.three) > 0 && nD[np.three[0]] == true {
np.two = append(np.two, np.three[0])
np = QRS{
one: np.one,
two: np.two,
three: np.three[1:],
}
product = append(product, np)
}
}
}
return product
}
The original Rules field changes because pointers and slices (which are references as well) are used.
Before calling OstarCF, the ChainsTo method is called. It uses the grammar object by value, so a copy is done, but the Rules field is a slice of pointers on Rules. So when this field is copied, it still points to the data of the original object.
Then, in method ChainsTo, there is a loop on the Rules field. It copies the Right field which is a slice of strings (so it still points to data of the original object):
rhs := rule.Right
Finally, a ns variable is declared by slicing rhs:
ns := rhs[:i]
ns = append(ns, rhs[i+1:]...)
At this stage, the ns variable still points to the buffer containing the slice of strings of the original object. Initially, i=0, so ns is an empty slice reusing the buffer. When items are appended, they replace the original data.
That's why your data are changed.
You can fix this problem by explicitly making a copy, for instance by replacing the above lines by:
ns := make( []string, 0, len(rhs) )
ns = append( ns, rhs[:i]...)
ns = append( ns, rhs[i+1:]...)
Go slices have replaced C pointer arithmetic, but they can be almost as dangerous/misleading in some cases.

Resources