Is there a more idiomatic way of creating an index buffer in Rust - vector

In computer graphics, one of the most basic patterns is creating several buffers for vertex attributes and an index buffer that groups these attributes together.
In Rust, this basic pattern looks like this:
struct Position { ... }
struct Uv { ... }
struct Vertex {
pos: usize,
uv: usize
}
// v_vertex[_].pos is an index into v_pos
// v_vertex[_].uv is an index into v_uv
struct Object {
v_pos: Vec<Position>, // attribute buffer
v_uv: Vec<Uv>, // attribute buffer
v_vertex: Vec<Vertex> // index buffer
}
However, this pattern leaves a lot to be desired. Any operations to the attribute buffers that modifies existing data is going to be primarily concerned with making sure the index buffer isn't invalidated. In short, it leaves most of the promises the rust compiler is capable of making on the table.
Making a scheme like this work isn't impossible. For example, the struct could include a HashMap that keeps track of changed indices and rebuilds the vectors when it gets too large. However, all the workarounds inevitably feel like another hack that doesn't address the underlying problem: there is no compile-checked guarantee that I'm not introducing data races or that the "reference" hasn't been invalidated somewhere else on accident.
When I first approached this problem when moving from C++ to Rust, I tried to make the Vertex object hold references to the attributes. That looked something like this:
struct Position { ... }
struct Uv { ... }
struct Vertex {
pos: &Position,
uv: &Uv
}
// obj.v_vertex[_].pos is a reference to an element of obj.v_pos
// obj.v_vertex[_].uv is a reference to an element of obj.v_uv
// This causes a lot of problems; it's effectively impossible to use
struct Object {
v_pos: Vec<Position>,
v_uv: Vec<Uv>,
v_vertex: Vec<Vertex>
}
...Which threw me down deep into the rabbit holes of self-referential structs and why they cannot exist in safe Rust. After learning more about, it turns out that as I suspected, the original implementation hid a lot of unsafety pitfalls that were caught by the compiler when I started being more explicit.
I'm aware of the existence unsafe solutions like Pin, but I feel like at that point I might as well stick to the original method.
This leads me to the core question: Is there an idiomatic way of representing this relationship? I want to be able to modify the contents of each of the Vecs in a compiler-checked manner.

Is there an idiomatic way of representing this relationship?
The usizes you started with are the idiomatic way of representing this relationship.
Any operations to the attribute buffers that modifies existing data is going to be primarily concerned with making sure the index buffer isn't invalidated. …
Yes; you should write those operations within the module that defines Object, and keep the fields private so the Object cannot become inconsistent as long as those operations are correctly defined.
In short, it leaves most of the promises the rust compiler is capable of making on the table.
It doesn't — because the Rust compiler is not actually capable of making those promises. & and even &mut references are actually very limited — they work by statically enforcing “Nobody (else) is going to change this value while you have the reference”. They don't have any bigger picture than that. In your case, assuming you're planning to edit this data, you will need to do operations that modify multiple parts in a consistent fashion, like “add a Position and also a Vertex that uses it”, or maybe “simultaneously add 3 vertices making up a triangle, using these 3 existing Positions”. References cannot help you do this correctly.
The only kind of data structure of this sort that you can in fact build using references is an append-only one, using the help of, for example, typed-arena. This might be suitable for an algorithm which is building a mesh. However, given that it's append-only, there is very little benefit — the operation “append a vertex, choosing indices as you go” is easy to write correctly without references. Additionally, you won't be able to store the mesh constructed that way long-term (because it is made of vectors that borrow from the arena) unless you also throw in ouroboros to wrap up the self-reference.
Fundamentally, references are designed to be used as temporary things — as a formalization and enforcement of common patterns used in C and C++ when passing and returning pointers — hence also being called “borrows”. The rules which the compiler understands about references are rules designed to handle those temporary uses. They are almost never what you should be building a data structure out of.

Related

What persistent data structures does Raku/Rakudo include?

Raku provides many types that are immutable and thus cannot be modified after they are created. Until I started looking into this area recently, my understanding was that these Types were not persistent data structures – that is, unlike the core types in Clojure or Haskell, my belief was that Raku's immutable types did not take advantage of structural sharing to allow for inexpensive copies. I thought that statement my List $new = (|$old-list, 42); literally copied the values in $old-list, without the data-sharing features of persistent data structures.
That description of my understanding is in the past tense, however, due to the following code:
my Array $a = do {
$_ = [rand xx 10_000_000];
say "Initialized an Array in $((now - ENTER now).round: .001) seconds"; $_}
my List $l = do {
$_ = |(rand xx 10_000_000);
say "Initialized the List in $((now - ENTER now).round: .001) seconds"; $_}
do { $a.push: rand;
say "Pushed the element to the Array in $((now - ENTER now).round: .000001) seconds" }
do { my $nl = (|$l, rand);
say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
do { my #na = |$l;
say "Copied List \$l into a new Array in $((now - ENTER now).round: .001) seconds" }
which produced this output in one run:
Initialized an Array in 5.938 seconds
Initialized the List in 5.639 seconds
Pushed the element to the Array in 0.000109 seconds
Appended an element to the List in 0.000109 seconds
Copied List $l into a new Array in 11.495 seconds
That is, creating a new List with the old values + one more is just as fast as pushing to a mutable Array, and dramatically faster than copying the List into a new Array – exactly the performance characteristics that you'd expect to see from a persistent List (copying to an Array is still slow because it can't take advantage of structural sharing without breaking the immutability of the List). The fast copying of $l into $nl is not due to either List being lazy; neither are.
All of the above leads me to believe that Lists in Rakudo actually are persistent data structures, with all the performance benefits that implies. That leaves me with several questions:
Am I right about Lists being persistent data structures?
Are all other immutable Types also persistent data structures? Or are any?
Is any of this part of Raku, or just an implementation choice Rakudo has made?
Are any of these performance characteristics documented/guaranteed anywhere?
I have to say, I am both extremely impressed and more than a bit baffled to discover evidence that at least some of Raku(do)'s types are persistent. It's the sort of feature that other languages list as a key selling point or that leads to the creation of libraries with 30k+ stars on GitHub. Have we really had it in Raku without even mentioning it?
I remember implementing these semantics, and I certainly don't recall thinking about them giving rise to a persistent data structure at the time - although it does seems fair to attach that label to the result!
I don't think you'll find anywhere that explicitly spells out this exact behavior, however the most natural implementation of things that are required by the language quite naturally leads to it. Taking the ingredients:
The infix:<,> operator is the List constructor in Raku
When a List is created, it is non-committal with regards to laziness and flattening (these arise from how we use the List, which we don't - in general - know at the point of its construction)
When we write (|$x, 1), the prefix:<|> operator constructs a Slip, which is a kind of List that should melt into its surrounding List. Thus what infix:<,> sees is a Slip and an Int.
Making the Slip melt into the result List immediately would mean making a commitment about eagerness, which List construction alone should not do. Thus the Slip and everything after it is placed into the lazily evaluated ("non-reified") portion of the List.
This last of these is what gives rise to the observed persistent data structure style behavior.
I expect it would be possible to have a implementation that inspects the Slip and chooses to eagerly copy things that are known not to be lazy, and still be in compliance with the specification test suite. That would change the time complexity of your example. If you want to be defensive against that, then:
do { my $nl = (|$l.lazy, rand);
say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
Should be sufficient to force the issue even if the implementation changed.
Of other cases that immediately come to mind that are related to persistent data structures or at least tail sharing:
The MoarVM implementation of strings, which is behind str and thus Str, implements string concatenation by creating a new string that refers to the two that are being concatenated instead of copying the data in the two strings (and does similar tricks for substr and repetition). This is strictly an optimization, not a language requirement, and in some delicate cases (the last grapheme of one string and the first grapheme of the next will form a single grapheme in the resulting string), it gives up and takes the copying path.
Outside of the core, modules like Concurrent::Stack, Concurrent::Queue, and Concurrent::Trie use tail sharing as a technique to implement relatively efficient lock-free data structures.

Golang RWMutex on map content edit

I'm starting to use RWMutex in my Go project with map since now I have more than one routine running at the same time and while making all of the changes for that a doubt came to my mind.
The thing is that I know that we must use RLock when only reading to allow other routines to do the same task and Lock when writing to full-block the map. But what are we supposed to do when editing a previously created element in the map?
For example... Let's say I have a map[int]string where I do Lock, put inside "hello " and then Unlock. What if I want to add "world" to it? Should I do Lock or can I do RLock?
You should approach the problem from another angle.
A simple rule of thumb you seem to understand just fine is
You need to protect the map from concurrent accesses when at least one of them is a modification.
Now the real question is what constitutes a modification of a map.
To answer it properly, it helps to notice that values stored in maps are not addressable — by design.
This was engineered that way simply due to the fact maps internally have intricate implementation which
might move values they contain in memory
to provide (amortized) fast access time
when the map's structure changes due to insertions and/or deletions of its elements.
The fact map values are not addressable means you can not do
something like
m := make(map[int]string)
m[42] = "hello"
go mutate(&m[42]) // take a single element and go modifying it...
// ...while other parts of the program change _other_ values
m[123] = "blah blah"
The reason you are not allowed to do this is the
insertion operation m[123] = ... might trigger moving
the storage of the map's element around, and that might
involve moving the storage of the element keyed by 42
to some other place in memory — pulling the rug
from under the feet of the goroutine
running the mutate function.
So, in Go, maps really only support three operations:
Insert — or replace — an element;
Read an element;
Delete an element.
You cannot modify an element "in place" — you can only
go in three steps:
Read the element;
Modify the variable containing the (read) copy;
Replace the element by the modified copy.
As you can now see, the steps (1) and (3) are mere map accesses,
and so the answer to your question is (hopefully) apparent:
the step (1) shall be done under at least an read lock,
and the step (3) shall be done under a write (exclusive) lock.
In contrast, elements of other compound types —
arrays (and slices) and fields of struct types —
do not have the restriction maps have: provided the storage
of the "enclosing" variable is not relocated, it is fine to
change its different elements concurrently by different goroutines.
Since the only way to change the value associated with the key in the map is to reassign the changed value to the same key, that is a write / modification, so you have to obtain the write lock–simply using the read lock will not be sufficient.

Difference between returning a pointer and a value in initialization methods [duplicate]

This question already has answers here:
Pointers vs. values in parameters and return values
(5 answers)
Closed 3 years ago.
Consider the following struct:
type Queue struct {
Elements []int
}
What would be the different between:
func NewQueue() Queue {
queue := Queue{}
return queue
}
and
func NewQueue() *Queue {
queue := &Queue{}
return queue
}
To me the seem practically the same, (and in fact trying it with some enqueuing and dequeueing yields the same results) but I still see both usages in the wild, so perhaps one is preferable.
It's possible to return a value and then the caller to call methods that have a pointer receiver. However, if the caller is always going to want to use pointers, because the object's big or because methods need to modify it in place, you might as well return a pointer. Pointers vs. values is a common question in Go and there's an answer trying to break down when to use one or the other.
In the specific case of a slice-backed Queue type, it's pretty small and fast to copy as a value, but if you want to be able to copy it around and have everyone see the same data whichever copy is accessed, you're going to need to use a pointer, because a slice is really a little struct of start pointer, length, and capacity, and those change when you reslice or grow it. If this is a surprise, the Go blog posts on the mechanics of append and slice usage and internals could be useful reading.
If your queue isn't for sharing or passing around but for using locally in a single function, you could provide an append-style interface where operations return a modified queue, but at that point maybe you just want to use slice tricks directly.
(If your queue is meant to be used concurrently, think hard about using a buffered channel. It might not be exactly what you're imagining, but a lot of the tricky bits have already been figured out for you by the implementers.)
Also--if Queue is really a slice with methods added, you can make it type Queue []int.

In terms of design and when writing a library, when should I use a pointer as an argument, and when should I not?

Sorry if my question seems stupid. My background is in PHP, Ruby, Python, Lua and similar languages, and I have no understanding of pointers in real-life scenarios.
From what I've read on the Internet and what I've got as responses in a question I asked (When is a pointer idiomatic?), I have understood that:
Pointers should be used when copying large data. Instead of getting the whole object hierarchy, receive its address and access it.
Pointers have to be used when you have a function on a struct that modifies it.
So, pointers seem like a great thing: I should just always get them as function arguments because they are so lightweight, and it's okay if I somehow end up not needing to modify anything on the struct.
However, looking at that statement intuitively, I can feel that it sounds very creepy, and yet I don't know why.
So, as someone who is designing a struct and its related functions, or just functions, when should I receive a pointer? When should I receive a value, and why?
In other words, when should my NewAuthor method return &Author{ ... }, and when should it return Author{ ... }? When should my function get a pointer to an author as an argument, and when should it just get the value (a copy) of type Author?
There's tradeoffs for both pointers and values.
Generally speaking, pointers will point to some other region of memory in the system. Be it the stack of the function that wants to pass a pointer to a local variable or some place on the heap.
func A() {
i := 25
B(&i) // A sets up stack frame to call B,
// it copies the address of i so B can look it up later.
// At this point, i is equal to 30
}
func B(i *int){
// Here, i points to A's stack frame.
// For this to execute, I look at my variable "i",
// see the memory address it points to, then look at that to get the value of 25.
// That address may be on another page of memory,
// causing me to have to look it up from main memory (which is slow).
println(10 + (*i))
// Since I have the address to A's local variable, I can modify it.
*i = 30
}
Pointers require me to de-reference them constantly whenever I was to see the data it points to. Sometimes you don't care. Other times it matters a lot. It really depends on the application.
If that pointer has to be de-referenced a lot (ie: you pass in a number to use in a bunch of different calcs), then you keep paying the cost.
Compared to using values:
func A() {
i := 25
B(i) // A sets up the stack frame to call B, copying in the value 25
// i is still 25, because A gave B a copy of the value, and not the address.
}
func B(i int){
// Here, i is simply on the stack. I don't have to do anything to use it.
println(10 + i)
// Since i here is a value on B's stack, modifications are not visible outside B's scpe
i = 30
}
Since there's nothing to dereference, it's basically free to use the local variable.
The down side of passing values happens if those values are large because copying data to the stack isn't free.
For an int it's a wash because pointers are "int" sized. For a struct, or an array, you are copying all the data.
Also, large objects on the stack can make the stack extra big. Go handles this well with stack re-allocation, but in high performance scenarios, it may be too much of an impact to performance.
There's a data safety aspect as well (can't modify something I pass by value), but I don't feel that is usually an issue in most code bases.
Basically, if your problem was already solvable by ruby, python or other language without value types, then these performance nuances don't super-matter.
In general, passing structs as pointers will usually do "the right thing" while learning the language.
For all other types, or things that you want to keep as read-only, pass values.
There are exceptions to that rule, but it's best that you learn those as needs arise rather than try to redefine your world all at once. If that makes sense.
Simply you can use pointers anywhere you want, sometimes you don't want to change your data. It may stand for abstract data, and you don't want to explicitly copy the data. Just pass by value and let compiler do its job.

Why does Go forbid taking the address of (&) map member, yet allows (&) slice element?

Go doesn't allow taking the address of a map member:
// if I do this:
p := &mm["abc"]
// Syntax Error - cannot take the address of mm["abc"]
The rationale is that if Go allows taking this address, when the map backstore grows or shinks, the address can become invalid, confusing the user.
But Go slice gets relocated when it outgrows its capacity, yet, Go allows us to take the address of a slice element:
a := make([]Test, 5)
a[0] = Test{1, "dsfds"}
a[1] = Test{2, "sdfd"}
a[2] = Test{3, "dsf"}
addr1 := reflect.ValueOf(&a[2]).Pointer()
fmt.Println("Address of a[2]: ", addr1)
a = append(a, Test{4, "ssdf"})
addrx := reflect.ValueOf(&a[2]).Pointer()
fmt.Println("Address of a[2] After Append:", addrx)
// Note after append, the first address is invalid
Address of a[2]: 833358258224
Address of a[2] After Append: 833358266416
Why is Go designed like this? What is special about taking address of slice element?
There is a major difference between slices and maps: Slices are backed by a backing array and maps are not.
If a map grows or shrinks a potential pointer to a map element may become a dangling pointer pointing into nowhere (uninitialised memory). The problem here is not "confusion of the user" but that it would break a major design element of Go: No dangling pointers.
If a slice runs out of capacity a new, larger backing array is created and the old backing array is copied into the new; and the old backing array remains existing. Thus any pointers obtained from the "ungrown" slice pointing into the old backing array are still valid pointers to valid memory.
If you have a slice still pointing to the old backing array (e.g. because you made a copy of the slice before growing the slice beyond its capacity) you still access the old backing array. This has less to do with pointers of slice elements, but slices being views into arrays and the arrays being copied during slice growth.
Note that there is no "reducing the backing array of a slice" during slice shrinkage.
A fundamental difference between map and slice is that a map is a dynamic data structure that moves the values that it contains as it grows. The specific implementation of Go map may even grow incrementally, a little bit during insert and delete operations until all values are moved to a bigger memory structure. So you may delete a value and suddenly another value may move. A slice on the other hand is just an interface/pointer to a subarray. A slice never grows. The append function may copy a slice into another slice with more capacity, but it leaves the old slice intact and is also a function instead of just an indexing operator.
In the words of the map implementor himself:
https://www.youtube.com/watch?v=Tl7mi9QmLns&feature=youtu.be&t=21m45s
"It interferes with this growing procedure, so if I take the address
of some entry in the bucket, and then I keep that entry around for a
long time and in the meantime the map grows, then all of a sudden that
pointer points to an old bucket and not a new bucket and that pointer
is now invalid, so it's hard to provide the ability to take the
address of a value in a map, without constraining how grow works...
C++ grows in a different way, so you can take the address of a bucket"
So, even though &m[x] could have been allowed and would be useful for short-lived operations (do a modification to the value and then not use that pointer again), and in fact the map internally does that, I think the language designers/implementors chose to be on the safe side with map, not allowing &m[x] in order to avoid subtle bugs with programs that might keep the pointer for a long time without realizing then it would point to different data than the programmer thought.
See also Why doesn't Go allow taking the address of map value? for related comments.
I've read a bunch of explanations about the difference between array pointers and map pointers and it all still seems a tad odd.
Consider this: https://go.dev/play/p/uzADxzdq2EP
I can get a pointer to the zeroth array object but after I add another object to the array the original pointer is still there but it no longer points to the zeroth object of the current array. It points to the original value. Sure, it's not pointing to a nil object, it's pointing to the same object, but it's no longer 'correct' for some version of correct.
I'm not sure what my point is here other than it's just...odd.

Resources