What is a "fat pointer"? - pointers

I've read the term "fat pointer" in several contexts already, but I'm not sure what exactly it means and when it is used in Rust. The pointer seems to be twice as large as a normal pointer, but I don't understand why. It also seems to have something to do with trait objects.

The term "fat pointer" is used to refer to references and raw pointers to dynamically sized types (DSTs) – slices or trait objects. A fat pointer contains a pointer plus some information that makes the DST "complete" (e.g. the length).
Most commonly used types in Rust are not DSTs but have a fixed size known at compile time. These types implement the Sized trait. Even types that manage a heap buffer of dynamic size (like Vec<T>) are Sized, as the compiler knows the exact number of bytes a Vec<T> instance will take up on the stack. There are currently four different kinds of DSTs in Rust.
Slices ([T] and str)
The type [T] (for any T) is dynamically sized (so is the special "string slice" type str). That's why you usually only see it as &[T] or &mut [T], i.e. behind a reference. This reference is a so-called "fat pointer". Let's check:
dbg!(size_of::<&u32>());
dbg!(size_of::<&[u32; 2]>());
dbg!(size_of::<&[u32]>());
This prints (with some cleanup):
size_of::<&u32>() = 8
size_of::<&[u32; 2]>() = 8
size_of::<&[u32]>() = 16
So we see that a reference to a normal type like u32 is 8 bytes large, as is a reference to an array [u32; 2]. Those two types are not DSTs. But as [u32] is a DST, the reference to it is twice as large. In the case of slices, the additional data that "completes" the DST is simply the length. So one could say the representation of &[u32] is something like this:
struct SliceRef {
ptr: *const u32,
len: usize,
}
Trait objects (dyn Trait)
When using traits as trait objects (i.e. type erased, dynamically dispatched), these trait objects are DSTs. Example:
trait Animal {
fn speak(&self);
}
struct Cat;
impl Animal for Cat {
fn speak(&self) {
println!("meow");
}
}
dbg!(size_of::<&Cat>());
dbg!(size_of::<&dyn Animal>());
This prints (with some cleanup):
size_of::<&Cat>() = 8
size_of::<&dyn Animal>() = 16
Again, &Cat is only 8 bytes large because Cat is a normal type. But dyn Animal is a trait object and therefore dynamically sized. As such, &dyn Animal is 16 bytes large.
In the case of trait objects, the additional data that completes the DST is a pointer to the vtable (the vptr). I cannot fully explain the concept of vtables and vptrs here, but they are used to call the correct method implementation in this virtual dispatch context. The vtable is a static piece of data that basically only contains a function pointer for each method. With that, a reference to a trait object is basically represented as:
struct TraitObjectRef {
data_ptr: *const (),
vptr: *const (),
}
(This is different from C++, where the vptr for abstract classes is stored within the object. Both approaches have advantages and disadvantages.)
Custom DSTs
It's actually possible to create your own DSTs by having a struct where the last field is a DST. This is rather rare, though. One prominent example is std::path::Path.
A reference or pointer to the custom DST is also a fat pointer. The additional data depends on the kind of DST inside the struct.
Exception: Extern types
In RFC 1861, the extern type feature was introduced. Extern types are also DSTs, but pointers to them are not fat pointers. Or more exactly, as the RFC puts it:
In Rust, pointers to DSTs carry metadata about the object being pointed to. For strings and slices this is the length of the buffer, for trait objects this is the object's vtable. For extern types the metadata is simply (). This means that a pointer to an extern type has the same size as a usize (ie. it is not a "fat pointer").
But if you are not interacting with a C interface, you probably won't ever have to deal with these extern types.
Above, we've seen the sizes for immutable references. Fat pointers work the same for mutable references, immutable raw pointers and mutable raw pointers:
size_of::<&[u32]>() = 16
size_of::<&mut [u32]>() = 16
size_of::<*const [u32]>() = 16
size_of::<*mut [u32]>() = 16

Related

How can I load all entries of a Vec<T> of arbitrary length onto the stack?

I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?
Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.
You can use the unsized_locals feature in nightly Rust:
#![feature(unsized_locals)]
fn example<T>(v: Vec<T>) {
let s: [T] = *v.into_boxed_slice();
dbg!(std::mem::size_of_val(&s));
}
fn main() {
let x = vec![42; 100];
example(x); // Prints 400
}
See also:
Is there a good way to convert a Vec<T> to an array?
How to get a slice as an array in Rust?
I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec
Sure you can.
Vec [...] seems to stride over pointers on the heap, dereferencing at each entry
Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.
for fast access
I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.

Why do structs and a mutable structs have different default equality operators?

I have the following code:
julia> struct Point
x
y
end
julia> Point(1,2) == Point(1,2)
true
julia> mutable struct Points
x
y
end
julia> Points(1,2) == Points(1,2)
false
Why are the two objects equal when it is a normal struct but not equal when it is a mutable struct?
The reason is that by default == falls back to ===. Now the way === works is (citing the documentation):
First the types of x and y are compared. If those are identical,
mutable objects are compared by address in memory and immutable objects (such as numbers)
are compared by contents at the bit level.
So for Point, which is immutable, the comparison of contents is performed (and it is identical in your case). While Points is mutable, so memory addresses of passed objects are compared and they are different as you have created two distinct objects.
Bogumił Kamiński is correct, but you might as why that difference in the definition of === exists between mutable and immutable types. The reason is that your immutable structs Point are actually indistinguishable. Since they can't change, their values will always be the same, and so they might as well be two names for the same object. Therefore, In the language they are defined by only their value.
In contrast, for mutabke structs there are at least two ways you can distinguish them. First, since mutable structs can't usually be stack allocated, they have a memory location, and you can compare the memory location of the two mutable structs and see they are different. Second, you can simply mutate one of them, and see that only one object changes whereas the other doesn't.
So, the reason for the difference in the definition of === is that two identitcal mutable structs can be distinguished, but two immutable ones cannot.

Does Rust protect me from iterator invalidation when pushing to a vector while iterating over it?

Does Rust protect me from iterator invalidation here or am I just lucky with realloc? What guarantees are given for an iterator returned for &'a Vec<T>?
fn main() {
let mut v = vec![0; 2];
println!("capacity: {}", v.capacity());
{
let v_ref = &mut v;
for _each in v_ref.clone() {
for _ in 0..101 {
(*v_ref).push(1); // ?
}
}
}
println!("capacity: {}", v.capacity());
}
In Rust, most methods take an &self - a reference to self. In most circumstances, a call like some_string.len() internally "expands" to something like this:
let a: String = "abc".to_string();
let a_len: usize = String::len(&a); // This is identical to calling `a.len()`.
However, consider a reference to an object: a_ref, which is an &String that references a. Rust is smart enough to determine whether a reference needs to be added or removed, like we saw above (a becomes &a); In this case, a_ref.len() expands to:
let a: String = "abc".to_string();
let a_ref: &String = &a;
let a_len: usize = String::len(a_ref); // This is identical to calling `a_ref.len();`. Since `a_ref` is a reference already, it doesn't need to be altered.
Notice that this is basically equivalent to the original example, except that we're using an explicitly-set reference to a rather than a directly.
This means that v.clone() expands to Vec::clone(&v), and similarly, v_ref.clone() expands to Vec::clone(v_ref), and since v_refis &v (or, specifically, &mut v), we can simplify this back into Vec::clone(&v). In other words, these calls are equivalent - calling clone() on a basic reference (&) to an object does not clone the reference, it clones the referenced object.
In other words, Tamas Hedgeus' comment is correct: You are iterating over a new vector, which contains elements that are clones of the elements in v. The item being iterated over in your for loop is not a &Vec, it's a Vec that is separate from v, and therefore iterator invalidation is not an issue.
As for your question about the guarantees Rust provides, you'll find that Rust's borrow checker handles this rather well without any strings attached.
If you were to remove clone() from the for loop, though, you would receive an error message, use of moved value: '*v_ref', because v_ref is considered 'moved' into the for loop when you iterate over it, and cannot be used for the remainder of the function; to avoid this, the iter function creates an iterator object that only borrows the vector, allowing you to reuse the vector after the loop ends (and the iterator is dropped). And if you were to try iterating over and mutating v without the v_ref abstraction, the error reads cannot borrow 'v' as mutable because it is also borrowed as immutable. v is borrowed immutably within the iterator spawned by v.iter() (which has type signature of fn iter(&self) -> Iter<T> - note, it makes a borrow to the vector), and will not allow you to mutate the vector as a result of Rust's borrow checker, until the iterator is dropped (at the end of the for loop). However, since you can have multiple immutable references to a single object, you can still read from the vector within the for loop, just not write into it.
If you need to mutate an element of a vector while iterating over the vector, you can use iter_mut, which returns mutable references to one element at a time and lets you change that element only. You still cannot mutate the iterated vector itself with iter_mut, because Rust ensures that there is only one mutable reference to an object at a time, as well as ensuring there are no mutable references to an object in the same scope as immutable references to that object.

Implementing custom primitive types in Julia

The Julia documentation says:
A primitive type is a concrete type whose data consists of plain old
bits. Classic examples of primitive types are integers and
floating-point values. Unlike most languages, Julia lets you declare
your own primitive types, rather than providing only a fixed set of
built-in ones. In fact, the standard primitive types are all defined
in the language itself:
I'm unable to find an example of how to do this, though, either in the docs or in the source code or anywhere else. What I'm looking for is an example of how to declare a primitive type, and how to subsequently implement a function or method on that type that works by manipulating those bits.
Is anyone able to point me towards an example? Thanks.
Edit: It's clear how to declare a primitive type, as there are examples immediately below the above quote in the doc. I'm hoping for information about how to subsequently manipulate them. For example, say I wanted to (pointlessly) implement my own primitive type MyInt8. I could declare that with primitive type MyInt8 <: Signed 8 end. But how would I subsequently implement a function myplus that manipulated the bits within Myint8?
PS in case it helps, the reason I'm asking is not because I need to do anything specific in Julia; I'm designing my own language for fun, and am researching how other languages implement various things.
# Declare the new type.
primitive type MyInt8 <: Signed 8 end
# A constructor to create values of the type MyInt8.
MyInt8(x :: Int8) = reinterpret(MyInt8, x)
# A constructor to convert back.
Int8(x :: MyInt8) = reinterpret(Int8, x)
# This allows the REPL to show values of type MyInt8.
Base.show(io :: IO, x :: MyInt8) = print(io, Int8(x))
# Declare an operator for the new type.
import Base: +
+ (a :: MyInt8, b :: MyInt8) = MyInt8(Int8(a) + Int8(b))
The key function here is reinterpret. It allows the bit representation of an Int8 to be treated as the new type.
To store a value with a custom bit layout, inside the MyInt8 constructor, you could perform any of the standard bit manipulation functions on the Int8 before 'reinterpreting' them as a MyInt8.

Explaining C declarations in Rust

I need to rewrite these C declarations in Go and Rust for a set of practice problems I am working on. I figured out the Go part, but I am having trouble with the Rust part. Any ideas or help to write these in Rust?
double *a[n];
double (*b)[n];
double (*c[n])();
double (*d())[n];
Assuming n is a constant:
let a: [*mut f64, ..n]; // double *a[n];
let b: *mut [f64, ..n]; // double (*b)[n];
let c: [fn() -> f64, ..n]; // double (*c[n])();
fn d() -> *mut [f64, ..n]; // double (*d())[n];
These are rather awkward and unusual types in any language. Rust's syntax, however, makes these declarations a lot easier to read than C's syntax does.
Note that d in C is a function declaration. In Rust, external function declarations are only allowed in extern blocks (see the FFI guide).
The answer depends on what, exactly, the * is for. For example, is the first one being used as an array of pointers to doubles, or is it an array of arrays of doubles? Are the pointers nullable or not?
Also, is n a constant or not? If it is, then you want an array; if it's not, you want a Vec.
Also also, are these global or local declarations? Are they function arguments? There's different syntax involved for each.
Frankly, without more context, it's impossible to answer this question with any accuracy. Instead, I will give you the following:
The Rust documentation contains all the information you'll need, although it's spread out a bit. Check the reference and any appropriate-looking guides. The FFI Guide is probably worth looking at.
cdecl is a website that will unpick C declarations if that's the part you're having difficulty with. Just note that you'll have to remove the semicolon and the n or it won't parse.
The floating point types in Rust are f32 and f64, depending on whether you're using float or double. Also, don't get caught: int in Rust is not equivalent to int in C. Prefer explicitly-sized types like i32 or u64, or types from libc like c_int. int and uint should only be used with explicitly pointer-sized values.
Normally, you'd write a reference to a T as &T or &mut T, depending on desired mutability (default in C is mutable, default in Rust is immutable).
If you want a nullable reference, use Option<&T>.
If you are trying to use these in a context where you start getting complaints about needing "lifetimes"... well, you're just going to have to learn the language. At that point, simple translation isn't going to work very well.
In Rust, array types are written as brackets around the element type. So an "array of doubles" would be [f64], an array of size n would be [f64, ..n]. Typically, however, the actual equivalent to, say, double[] in C would be &[f64]; that is, a reference to an array, rather then the actual contents of the array.
Use of "raw pointers" is heavily discouraged in Rust, and you cannot use them meaningfully outside of unsafe code. In terms of syntax, a pointer to T is *const T or *mut T, depending on whether it's a pointer to constant or mutable data.
Function pointers are just written as fn (Args...) -> Result. So a function that takes nothing and returns a double would be fn () -> f64.

Resources