Related
If I define a new struct as
mutable struct myStruct
data::AbstractMatrix
labels::Vector{String}
end
and I want to throw an error if the length of labels is not equal to the number of columns of data, I know that I can write a constructor that enforces this condition like
myStruct(data, labels) = length(labels) != size(data)[2] ? error("Labels incorrect length") : new(data,labels)
However, once the struct is initialized, the labels field can be set to the incorrect length:
m = myStruct(randn(2,2), ["a", "b"])
m.labels = ["a"]
Is there a way to throw an error if the labels field is ever set to length not equal to the number of columns in data?
You could use StaticArrays.jl to fix the matrix and vector's sizes to begin with:
using StaticArrays
mutable struct MatVec{R, C, RC, VT, MT}
data::MMatrix{R, C, MT, RC} # RC should be R*C
labels::MVector{C, VT}
end
but there's the downside of having to compile for every concrete type with a unique permutation of type parameters R,C,MT,VT. StaticArrays also does not scale as well as normal Arrays.
If you don't restrict dimensions in the type parameters (with all those downsides) and want to throw an error at runtime, you got good and bad news.
The good news is you can control whatever mutation happens to your type. m.labels = v would call the method setproperty!(object::myStruct, name::Symbol, v), which you can define with all the safeguards you like.
The bad news is that you can't control mutation to the fields' types. push!(m.labels, 1) mutates in the push!(a::Vector{T}, item) method. The myStruct instance itself doesn't actually change; it still points to the same Vector. If you can't guarantee that you won't do something like x = m.labels; push!(x, "whoops") , then you really do need runtime checks, like iscorrect(m::myStruct) = length(m.labels) == size(m.data)[2]
A good option is to not access the fields of your struct directly. Instead, do it using a function. Eg:
mutable struct MyStruct
data::AbstractMatrix
labels::Vector{String}
end
function modify_labels(s::MyStruct, new_labels::Vector{String})
# do all checks and modifications
end
You should check chapter 8 from "Hands-On Design Patterns and Best Practices with Julia: Proven solutions to common problems in software design for Julia 1.x"
I Go, I assumed slices were passed by reference, but this seems to work for values
but not for the array itself. For example, If I have this struct:
l := Line{
Points: []Point{
Point{3, 4},
},
}
I can define a variable, which gets passed a reference to the struct's slice
slice := l.Points
And then if I modify it, the original struct referenced by the variable
is going to reflect those modifications.
slice[0].X = 1000
fmt.Printf(
"This value %d is the same as this %d",
slice[0].X,
l.Points[0].X,
)
This differs from the behavior of arrays which, I assume, are passed by value.
So, for example, if I had defined the previous code using an array:
l := Line{
Points: [1]Point{
Point{3, 4},
},
}
arr := l.Points
arr[0].X = 1000
fmt.Println(arr.[0].X != s.Points[0].X) // equals true, original struct is untouched
Then, the l struct wouldn't have been modified.
Now, if I want to modify the slice itself I obviously cannot do this:
slice = append(slice, Point{99, 100})
Since that would only redefine the slice variable, losing the original reference.
I know I can simply do this:
l.Points = append(l.Points, Point{99, 100})
But, in some cases, it is more convenient to have another variable instead of having
to type the whole thing.
I tried this:
*slice = append(*slice, Point{99, 100})
But it doesn't work as I am trying to dereference something that apparently is not a pointer.
I finally tried this:
slice := &l.Points
*slice = append(l.Points, Point{99, 100})
And it works, but I am not sure what is happening. Why is the value of slice not overwritten? How does append works here?
Let's dispense first with a terminology issue. The Go language specification does not use the word reference the way you are using it. Go does however have pointers, and pointers are a form of reference. In addition, slices and maps are kind of special as there's some underlying data—the array underneath a slice, or the storage for a map—that may or may not already exist or be created by declaring or defining a variable whose type is slice of T or map[T1]T2 for some type T or type-pair T1 and T2.1
We can take your usage of the word reference to mean explicit pointer when talking about, e.g.:
func f1(p *int) {
// code ...
}
and the implied pointer when talking about:
func f2(m map[T1]T2) { ... }
func f3(s []T) { ... }
In f1, p really is a pointer: it thus refers to some actual int, or is nil. In f2, m refers to some underlying map, or is nil. In f3, s refers to some underlying array, or is nil.
But if you write:
l := Line{
Points: []Point{
Point{3, 4},
},
}
then you must have written:
type Line struct {
// ... maybe some fields here ...
Points []Point
// ... maybe more fields here ...
}
This Line is a struct type. It is not a slice type; it is not a map type. It contains a slice type but it is not itself one.
You now talk about passing these slices. If you pass l, you're passing the entire struct by value. It's pretty important to distinguish between that, and passing the value of l.Points. The function that receives one of these arguments must declare it with the right type.
For the most part, then, talking about references is just a red herring—a distraction from what's really going on. What we need to know is: What variables are you assigning what values, using what source code?
With all of that out of the way, let's talk about your actual code samples:
l.Points = append(l.Points, Point{99, 100})
This does just what it says:
Pass l.Points to append, which is a built-in as it is somewhat magically type-flexible (vs the rest of Go, where types are pretty rigid). It takes any value of type []T (slice of T, for any valid type T) plus one or more values of type T, and produces a new value of the same type, []T.
Assigns the result to l.Points.
When append does its work, it may:
receive nil (of the given type): in this case, it creates the underlying array, or
receive a non-nil slice: in this case, it writes into the underlying array or discards that array in favor of a new larger-capacity array as needed.2
So in all cases, the underlying array may have, in effect, just been created or replaced. It's therefore important that any other use of the same underlying array be updated appropriately. Assigning the result back to l.Points updates the—presumably one-and-only—slice variable that refers to the underlying array.
We can, however, break these assumptions:
s2 := l.Points
Now l.Points and s2 both refer to the (single) underlying array. Operations that modify that underlying array will, at least potentially, affect both s2 and l.Points.
Your second example is itself OK:
*slice = append(*slice, Point{99, 100})
but you haven't shown how slice itself was declared and/or assigned-to.
Your third example is fine as well:
slice := &l.Points
*slice = append(l.Points, Point{99, 100})
The first of these lines declares-and-initializes slice to point to l.Points. The variable slice therefore has type *[]Point. Its value—the value in slice, that is, rather than that in *slice—is the address of l.Points, which has type []Point.
The value in *slice is the value in l.Points. So you could write:
*slice = append(*slice, Point{99, 100})
here. Since *slice is just another name for l.Points, you can also write:
l.Points = append(*slice, Point{99, 100})
You only need to use *slice if there's some reason that l.Points is not available,3 but you may use *slice if that's more convenient. Reading *slice reads l.Points and updating *slice updates l.Points.
1To see what I mean by may or may not be created here, consider:
var s []int
vs:
var s = []int{42}
The first leaves s == nil while the second creates an underlying array with the capacity to hold the one int value 42, holding the one int value 42, so that s != nil.
2It's not clear to me whether there is a promise never to write on an existing slice-array whose capacity is greater than its current length, but not sufficient to hold the final result. That is, can append first append 10 objects to the existing underlying array, then discover that it needs a bigger array and expand the underlying array? The difference is observable if there are other slice values referring to the existing underlying array.
3Here, a classic example would occur if you have reason to pass l.Points or &l.Points to some existing (pre-written) function:
If you need pass l.Points—the slice value—to some existing function, that existing function cannot change the slice value, but could change the underlying array. That's probably a bad plan, so if it does do this, make sure that this is OK! If it only reads the slice and underlying array, that's a lot safer.
If you need to pass &l.Points—a value that points to the slice value—to some existing function, that existing function can change both the slice, and the underlying array.
If you're writing a new function, it's up to you to write it in whatever manner is most appropriate. If you're only going to read the slice and underlying array, you can take a value of type []Point. If you intend to update the slice in place, you should take a value of type *[]Point—pointer to slice of Point.
Append returns a new slice that may modify the original backing array of the initial slice. The original slice will still point to the original backing array, not the new one (which may or may not be in the same place in memory)
For example (playground)
slice := []int{1,2,3}
fmt.Println(len(slice))
// Output: 3
newSlice := append(slice, 4)
fmt.Println(len(newSlice))
// Output: 4
fmt.Println(len(slice))
// Output: 3
While a slice can be described as a "fat pointer to an array", it is not a pointer and therefore you can't dereference it, which is why you get an error.
By creating a pointer to a slice, and using append as you did above, you are setting the slice the pointer points to to the "new" slice returned by append.
For more information, check out Go Slice Usage And Internals
Your first attempt didn't work because slices are not pointers, they can be considered reference types. Append will modify the underlying array if it has enough capacity, otherwise it returns a new slice.
You can achieve what you want with a combination of your two attempts.
playground
l := Line{
Points: []Point{
Point{3, 4},
},
}
slice := &l.Points
for i := 0; i < 100; i++ {
*slice = append(*slice, Point{99 + i, 100 + i})
}
fmt.Println(l.Points)
I know that this might be sacrilegious, but, for me, it is useful to think of slices
as structs.
type Slice struct {
len int
cap int
Array *[n]T // Pointer to array of type T
}
Since in languages like C, the [] operator is also a dereferencing operator, we can think that every time we are accessing a slice, we are actually dereferencing the underlying array and assigning some value to it. That is:
var s []int
s[0] = 1
Might be thought of as equivalent to (in pseudo-code):
var s Slice
*s.Array[0] = 1
That is why we can say that slices are "pointers". For that reason, it can modify its underlying array like this:
myArray := [3]int{1,1,1}
mySlice := myArray[0:1]
mySlice = append(mySlice, 2, 3) // myArray == mySlice
Modifying mySlice also modifies myArray, since the slice stores a pointer to the array and, on appending, we are dereferencing that pointer.
This behavior, nonetheless, is not always like this. If we exceed the capacity of the original array, a new array is created and the original array is left untouched.
myArray := [3]int{1,1,1}
mySlice := myArray[0:1]
mySlice = append(mySlice, 2, 3, 4, 5) // myArray != mySlice
The confusion arises when we try to treat the slice itself as an actual pointer. Since we can modify an underlying array by appending to it, we are led to believe that in this case:
sliceCopy := mySlice
sliceCopy = append(sliceCopy, 6)
both slices, slice and sliceCopy are the same, but they are not. We have to explicitly pass a reference to the memory address of the slice (using the & operator) in order to modify it. That is:
sliceAddress := &mySlice
*sliceAddress = append(mySlice, 6) // or append(*sliceAddress, 6)
See also
https://forum.golangbridge.org/t/slice-pass-as-value-or-pointer/2866/4
https://blog.golang.org/go-slices-usage-and-internals
https://appliedgo.net/slices/
When the formal parameter is map, assigning a value directly to a formal parameter cannot change the actual argument, but if you add a new key and value to the formal parameter, the actual argument outside the function can also be seen. Why is that?
I don't understand the output value of the following code, and the formal parameters are different from the actual parameters.
unc main() {
t := map[int]int{
1: 1,
}
fmt.Println(unsafe.Pointer(&t))
copysss(t)
fmt.Println(t)
}
func copysss(m map[int]int) {
//pointer := unsafe.Pointer(&m)
//fmt.Println(pointer)
m = map[int]int{
1: 2,
}
}
stdout :0xc000086010
map[1:1]
func main() {
t := map[int]int{
1: 1,
}
fmt.Println(unsafe.Pointer(&t))
copysss(t)
fmt.Println(t)
}
func copysss(m map[int]int) {
//pointer := unsafe.Pointer(&m)
//fmt.Println(pointer)
m[1] = 2
}
stdout :0xc00007a010
map[1:2]
func main() {
t := map[int]int{
1: 1,
}
fmt.Println(unsafe.Pointer(&t))
copysss(t)
fmt.Println(t)
}
func copysss(m map[int]int) {
pointer := unsafe.Pointer(&m)
fmt.Println(pointer)
m[1] = 2
}
stdout:0xc00008a008
0xc00008a018
map[1:2]
I want to know if the parameter is a value or a pointer.
The parameter is both a value and a pointer.
Wait.. whut?
Yes, a map (and slices, for that matter) are types, pretty similar to what you would implement. Think of a map like this:
type map struct {
// meta information on the map
meta struct{
keyT type
valueT type
len int
}
value *hashTable // pointer to the underlying data structure
}
So in your first function, where you reassign m, you're passing a copy of the struct above (pass by value), and you're assigning a new map to it, creating a new hashtable pointer in the process. The variable in the function scope is updated, but the one you passed still holds a reference to the original map, and with it, the pointer to the original map is preserved.
In the second snippet, you're accessing the underlying hash table (a copy of the pointer, but the pointer points to the same memory). You're directly manipulating the original map, because you're just changing the contents of the memory.
So TL;DR
A map is a value, containing meta information of what the map looks like, and a pointer to the actual data stored inside. The pointer is passed by value, like anything else (same way pointers are passed by value in C/C++), but of course, dereferencing a pointer means you're changing the values in memory directly.
Careful...
Like I said, slices work pretty much in the same way:
type slice struct {
meta struct {
type T
len, cap int
}
value *array // yes, it's a pointer to an underlying array
}
The underlying array is of say, a slice of ints will be [10]int if the cap of the slice is 10, regardless of the length. A slice is managed by the go runtime, so if you exceed the capacity, a new array is allocated (twice the cap of the previous one), the existing data is copied over, and the slice value field is set to point to the new array. That's the reason why append returns the slice that you're appending to, the underlying pointer may have changed etc.. you can find more in-depth information on this.
The thing you have to be careful with is that a function like this:
func update(s []int) {
for i, v := range s {
s[i] = v*2
}
}
will behave much in the same way as the function you have were you're assigning m[1] = 2, but once you start appending, the runtime is free to move the underlying array around, and point to a new memory address. So bottom line: maps and slices have an internal pointer, which can produce side-effects, but you're better off avoiding bugs/ambiguities. Go supports multiple return values, so just return a slice if you set about changing it.
Notes:
In your attempt to figure out what a map is (reference, value, pointer...), I noticed you tried this:
pointer := unsafe.Pointer(&m)
fmt.Println(pointer)
What you're doing there, is actually printing the address of the argument variable, not any address that actually corresponds to the map itself. the argument passed to unsafe.Pointer isn't of the type map[int]int, but rather it's of type *map[int]int.
Personally, I think there's too much confusion around passing by value vs passing by . Go works exactly like C in this regard, just like C, absolutely everything is passed by value. It just so happens that this value can sometimes be a memory address (pointer).
More details (references)
Slices: usage & internals
Maps Note: there's some confusion caused by this one, as pointers, slices, and maps are referred to as *reference types*, but as explained by others, and elsewhere, this is not to be confused with C++ references
In Go, map is a reference type. This means that the map actually resides in the heap and variable is just a pointer to that.
The map is passed by copy. You can change the local copy in your function, but this will not be reflected in caller's scope.
But, since the map variable is a pointer to the unique map residing in the heap, every change can be seen by any variable that points to the same map.
This article can clarify the concept: https://www.ardanlabs.com/blog/2014/12/using-pointers-in-go.html.
First, my definition of composite key - two ore more values combine to make the key. Not to confuse with composite keys in databases.
My goal is to save computed values of pow(x, y) in a hash table, where x and y are integers. This is where I need ideas on how to make a key, so that given x and y, I can look it up in the hash table, to find pow(x,y).
For example:
pow(2, 3) => {key(2,3):8}
What I want to figure out is how to get the map key for the pair (2,3), i.e. the best way to generate a key which is a combination of multiple values, and use it in hash table.
The easiest and most flexible way is to use a struct as the key type, including all the data you want to be part of the key, so in your case:
type Key struct {
X, Y int
}
And that's all. Using it:
m := map[Key]int{}
m[Key{2, 2}] = 4
m[Key{2, 3}] = 8
fmt.Println("2^2 = ", m[Key{2, 2}])
fmt.Println("2^3 = ", m[Key{2, 3}])
Output (try it on the Go Playground):
2^2 = 4
2^3 = 8
Spec: Map types: You may use any types as the key where the comparison operators == and != are fully defined, and the above Key struct type fulfills this.
Spec: Comparison operators: Struct values are comparable if all their fields are comparable. Two struct values are equal if their corresponding non-blank fields are equal.
One important thing: you should not use a pointer as the key type (e.g. *Key), because comparing pointers only compares the memory address, and not the pointed values.
Also note that you could also use arrays (not slices) as key type, but arrays are not as flexible as structs. You can read more about this here: Why have arrays in Go?
This is how it would look like with arrays:
type Key [2]int
m := map[Key]int{}
m[Key{2, 2}] = 4
m[Key{2, 3}] = 8
fmt.Println("2^2 = ", m[Key{2, 2}])
fmt.Println("2^3 = ", m[Key{2, 3}])
Output is the same. Try it on the Go Playground.
Go can't make a hash of a slice of ints.
Therefore the way I would approach this is mapping a struct to a number.
Here is an example of how that could be done:
package main
import (
"fmt"
)
type Nums struct {
num1 int
num2 int
}
func main() {
powers := make(map[Nums]int)
numbers := Nums{num1: 2, num2: 4}
powers[numbers] = 6
fmt.Printf("%v", powers[input])
}
I hope that helps
Your specific problem is nicely solved by the other answers. I want to add an additional trick that may be useful in some corner cases.
Given that map keys must be comparable, you can also use interfaces. Interfaces are comparable if their dynamic values are comparable.
This allows you to essentially partition the map, i.e. to use multiple types of keys within the same data structure. For example if you want to store in your map n-tuples (it wouldn't work with arrays, because the array length is part of the type).
The idea is to define an interface with a dummy method (but it can surely be not dummy at all), and use that as map key:
type CompKey interface {
isCompositeKey() bool
}
var m map[CompKey]string
At this point you can have arbitrary types implementing the interface, either explicitly or by just embedding it.
In this example, the idea is to make the interface method unexported so that other structs may just embed the interface without having to provide an actual implementation — the method can't be called from outside its package. It will just signal that the struct is usable as a composite map key.
type AbsoluteCoords struct {
CompKey
x, y int
}
type RelativeCoords struct {
CompKey
x, y int
}
func foo() {
p := AbsoluteCoords{x: 1, y: 2}
r := RelativeCoords{x: 10, y: 20}
m[p] = "foo"
m[r] = "bar"
fmt.Println(m[AbsoluteCoords{x: 10, y: 20}]) // "" (empty, types don't match)
fmt.Println(m[RelativeCoords{x: 10, y: 20}]) // "bar" (matches, key present)
}
Of course nothing stops you from declaring actual methods on the interface, that may be useful when ranging over the map keys.
The disadvantage of this interface key is that it is now your responsibility to make sure the implementing types are actually comparable. E.g. this map key will panic:
type BadKey struct {
CompKey
nonComparableSliceField []int
}
b := BadKey{nil, []int{1,2}}
m[b] = "bad!" // panic: runtime error: hash of unhashable type main.BadKey
All in all, this might be an interesting approach when you need to keep two sets of K/V pairs in the same map, e.g. to keep some sanity in function signatures or to avoid defining structs with N very similar map fields.
Playground https://play.golang.org/p/0t7fcvSWdy7
I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want