Do Rust fat pointers consume less memory than thin pointer within vectors? - pointers

I was reading this Reddit comment which compares thin and fat pointers.
To be specific, the downsides of thin pointers within vectors:
User A heap allocates each object individually and takes many thin
pointers to them. For user A thin pointers are pretty good, since the
vtables are stored once, and the pointers are very lightweight. Now
consider user B, which allocates the objects of the same type on
vectors, and never views them polymorphically, or only occasionally
does so. User B is paying for something that it does not use.
And the upside of fat pointers within vectors:
In Rust, one only pays this price when one actually goes ahead and
creates a trait object. This is perfect for user B. For user A, this
is far from perfect, since now its pointers are twice as big.
It does not make sense why fat pointers are better for User B because memory use seems to be the same or higher within vectors, not lower.
Fat pointer cost = 2 pointers + vtable size + data size
(VTABLE PTR -> VTABLE, DATA PTR -> DATA)
Thin pointer cost = 2 pointers + vtable size + data size
(PTR -> [VTABLE PTR -> VTABLE, DATA])
Given this layout, a vector of thin pointers will roughly take up capacity * 1 pointer in size and a vector of fat pointers will take capacity * 2 pointers in size. Overall required memory remains the same. What am I missing here, or is the author incorrect in his assessment?

You can have multiple references to the same object, either via immutably borrowing or via shared ownership like Arc. For fat pointers, that means doubling the size of each reference. If you have lots of polymorphic references, fat pointers will use more space than thin pointers.
However, fat pointers are only used if runtime polymorphism is needed - references to concrete types use thin pointers instead, since they know the type involved. Since Rust tends to use generics more than polymorphism, fat pointers aren't used very frequently. Conversely, thin pointers require adding a hidden vtable field to every structure that may be used polymorphically, which for most code would be a waste.

Related

What does crossbeam_epoch::Shared::as_raw mean by "Converts the pointer to a raw pointer (without the tag)"?

Can someone translate this into something that makes sense for me:
Converts the pointer to a raw pointer (without the tag).
What is the difference between a pointer and a raw pointer?
The Stack Overflow raw-pointer tag says neither "smart" nor "shared" which again is mystifying.
What are Crossbeam's Shared::as_raw's "tags" all about?
crossbeam_epoch::Shared is a smart pointer. That is, a pointer plus extra stuff. In C++ or Rust, smart pointer is the term used for a pointer wrapper which adds any of the following:
Ownership information
Lifetime information
Packing extra data in unused bits
Copy-on-write behavior
Reference counting
In that context, a raw pointer is just the wrapped pointer, without all the extra stuff.
crossbeam_epoch::Shared fits (among others) in the “Packing extra data in unused bits” category above. Most data in modern computers is aligned, that is, addresses are a multiple of some power of two. This means that all low bits of the addresses are always 0. One can use that fact to store a few extra bits of information in a pointer.
This extra data is called tag by this particular library, however that term isn't as common as raw pointer.

Does Rust box the individual items that are added to a vector?

According to the Rust documentation:
Vectors always allocate their data on the heap.
As I understand this, it means that:
Rust will allocate enough memory on the heap to store the type T in a contiguous fashion.
Rust will not individually box the items as they are placed into the vector.
In other words, if I add a few integers to a vector, while the Vec will allocate enough storage to store those integers, it's not also going to box those integers; introducing another layer of indirection.
I'm not sure how I can illustrate or confirm this with code examples but any help is appreciated.
Yes, Vec<T> will store all items in a contiguous buffer rather than boxing them individually. The documentation states:
A contiguous growable array type, written Vec<T> but pronounced 'vector.'
Note that it is also possible to slice a vector, to get a &[T] (slice). Its documentation, again, confirms this:
A dynamically-sized view into a contiguous sequence, [T].
Slices are a view into a block of memory represented as a pointer and a length.

In terms of design and when writing a library, when should I use a pointer as an argument, and when should I not?

Sorry if my question seems stupid. My background is in PHP, Ruby, Python, Lua and similar languages, and I have no understanding of pointers in real-life scenarios.
From what I've read on the Internet and what I've got as responses in a question I asked (When is a pointer idiomatic?), I have understood that:
Pointers should be used when copying large data. Instead of getting the whole object hierarchy, receive its address and access it.
Pointers have to be used when you have a function on a struct that modifies it.
So, pointers seem like a great thing: I should just always get them as function arguments because they are so lightweight, and it's okay if I somehow end up not needing to modify anything on the struct.
However, looking at that statement intuitively, I can feel that it sounds very creepy, and yet I don't know why.
So, as someone who is designing a struct and its related functions, or just functions, when should I receive a pointer? When should I receive a value, and why?
In other words, when should my NewAuthor method return &Author{ ... }, and when should it return Author{ ... }? When should my function get a pointer to an author as an argument, and when should it just get the value (a copy) of type Author?
There's tradeoffs for both pointers and values.
Generally speaking, pointers will point to some other region of memory in the system. Be it the stack of the function that wants to pass a pointer to a local variable or some place on the heap.
func A() {
i := 25
B(&i) // A sets up stack frame to call B,
// it copies the address of i so B can look it up later.
// At this point, i is equal to 30
}
func B(i *int){
// Here, i points to A's stack frame.
// For this to execute, I look at my variable "i",
// see the memory address it points to, then look at that to get the value of 25.
// That address may be on another page of memory,
// causing me to have to look it up from main memory (which is slow).
println(10 + (*i))
// Since I have the address to A's local variable, I can modify it.
*i = 30
}
Pointers require me to de-reference them constantly whenever I was to see the data it points to. Sometimes you don't care. Other times it matters a lot. It really depends on the application.
If that pointer has to be de-referenced a lot (ie: you pass in a number to use in a bunch of different calcs), then you keep paying the cost.
Compared to using values:
func A() {
i := 25
B(i) // A sets up the stack frame to call B, copying in the value 25
// i is still 25, because A gave B a copy of the value, and not the address.
}
func B(i int){
// Here, i is simply on the stack. I don't have to do anything to use it.
println(10 + i)
// Since i here is a value on B's stack, modifications are not visible outside B's scpe
i = 30
}
Since there's nothing to dereference, it's basically free to use the local variable.
The down side of passing values happens if those values are large because copying data to the stack isn't free.
For an int it's a wash because pointers are "int" sized. For a struct, or an array, you are copying all the data.
Also, large objects on the stack can make the stack extra big. Go handles this well with stack re-allocation, but in high performance scenarios, it may be too much of an impact to performance.
There's a data safety aspect as well (can't modify something I pass by value), but I don't feel that is usually an issue in most code bases.
Basically, if your problem was already solvable by ruby, python or other language without value types, then these performance nuances don't super-matter.
In general, passing structs as pointers will usually do "the right thing" while learning the language.
For all other types, or things that you want to keep as read-only, pass values.
There are exceptions to that rule, but it's best that you learn those as needs arise rather than try to redefine your world all at once. If that makes sense.
Simply you can use pointers anywhere you want, sometimes you don't want to change your data. It may stand for abstract data, and you don't want to explicitly copy the data. Just pass by value and let compiler do its job.

How can I create an owning pointer to an unsized type?

Dealing with values of type str in Rust is clumsy because they do not implement the trait Sized. Therefore, they can only be accessed by pointer.
For my application, using ordinary pointers with lifetimes is not very helpful. Rather, I want an owning fat pointer that guarantees that the contained object will last as long as the pointer does (and no longer), but allows holding values of unknown size.
Box<T> works for an unsized T; thus Box<str>, Box<[T]> and so forth. The important distinction to note between Box<str> and String is that the latter has a capacity member as well, increasing its memory usage by one word but allowing for efficient appending as it may not need to reallocate for every push, whereas a similar method on a Box<str> would need to. The same is true of Box<[T]> versus Vec<T>, with the former being a fixed-size slice while the latter is conveniently growable. Unlike Box<str>, Box<[T]> is actually used in real life; the vec! macro uses it for efficiency, as a Box<[T]> can be written out literally and then converted to a Vec<T> at no cost.

OpenCL performance: using arrays of primitives vs arrays of structures

I'm trying to use several arrays of doubles in a kernel that are all the same length. Instead of passing each double* in as a separate argument, I know I can define a structure in the .cl file that holds several doubles and then just pass into the kernel one pointer for an array of the structures instead.
Will the performance be different for the two ways? Please correct me if I am wrong, but I think passing individual double pointers means the access can be coalesced. Will accessing the structures also be coalesced?
As long as your structures don't contain any pointers what you say is absolutely possible. The primary impact is generally, as you've already considered, the effect this has on the coalescing of memory operations. How big an effect is down to your memory access pattern, the size of your struct and the device you're running on. More details would be needed to describe this more fully.
Saying that, one instance where I've used a struct in this way very successfully is where the element being read is the same for all work items in a work group. In this case there is no penalty on my hardware (nvidia GTX 570). Also it is worth remembering that in some cases the added latency introduced by the memory operation being serialised can be hidden. In the CUDA world this would be achieved by having a high occupancy for a problem with a high arithmetic intensity.
Finally it is worth pointing out that the semantic clarity of using a struct can have a benefit in and of itself. You'll have to consider this against any performance cost for your particular problem. My advice is to try it and see; it is very difficult to predict the impact of these issues ahead of time.
Theoretically it is the same performance. However if you access some of the members more often, using several segregatedvarrays will have much more performance, due to cpu cache locality. However most operations will be more difficult when you have several arrays.
The structures and the single elements will have the exact same performance.
Supose you have a big array of doubles, and the first work item uses 0, 100, 200, 300, ... and the next one uses 1, 101, 201, 301, ...
If you have a structure of 100 doubles, in memory the first structure will be first(0-99), then the second(100-199) and so on. The kernels will acess exactly the same memory and in the same places, the only difference is how you define the memory abstraction.
In a more generic case of having a structure of different element types (char, int, double, bool, ...) it may happen that the aligment is not as if it were a single array of data. But it will still be "semi-coalesced". I would even bet the performance is still the same.

Resources