Explaining C declarations in Rust - pointers

I need to rewrite these C declarations in Go and Rust for a set of practice problems I am working on. I figured out the Go part, but I am having trouble with the Rust part. Any ideas or help to write these in Rust?
double *a[n];
double (*b)[n];
double (*c[n])();
double (*d())[n];

Assuming n is a constant:
let a: [*mut f64, ..n]; // double *a[n];
let b: *mut [f64, ..n]; // double (*b)[n];
let c: [fn() -> f64, ..n]; // double (*c[n])();
fn d() -> *mut [f64, ..n]; // double (*d())[n];
These are rather awkward and unusual types in any language. Rust's syntax, however, makes these declarations a lot easier to read than C's syntax does.
Note that d in C is a function declaration. In Rust, external function declarations are only allowed in extern blocks (see the FFI guide).

The answer depends on what, exactly, the * is for. For example, is the first one being used as an array of pointers to doubles, or is it an array of arrays of doubles? Are the pointers nullable or not?
Also, is n a constant or not? If it is, then you want an array; if it's not, you want a Vec.
Also also, are these global or local declarations? Are they function arguments? There's different syntax involved for each.
Frankly, without more context, it's impossible to answer this question with any accuracy. Instead, I will give you the following:
The Rust documentation contains all the information you'll need, although it's spread out a bit. Check the reference and any appropriate-looking guides. The FFI Guide is probably worth looking at.
cdecl is a website that will unpick C declarations if that's the part you're having difficulty with. Just note that you'll have to remove the semicolon and the n or it won't parse.
The floating point types in Rust are f32 and f64, depending on whether you're using float or double. Also, don't get caught: int in Rust is not equivalent to int in C. Prefer explicitly-sized types like i32 or u64, or types from libc like c_int. int and uint should only be used with explicitly pointer-sized values.
Normally, you'd write a reference to a T as &T or &mut T, depending on desired mutability (default in C is mutable, default in Rust is immutable).
If you want a nullable reference, use Option<&T>.
If you are trying to use these in a context where you start getting complaints about needing "lifetimes"... well, you're just going to have to learn the language. At that point, simple translation isn't going to work very well.
In Rust, array types are written as brackets around the element type. So an "array of doubles" would be [f64], an array of size n would be [f64, ..n]. Typically, however, the actual equivalent to, say, double[] in C would be &[f64]; that is, a reference to an array, rather then the actual contents of the array.
Use of "raw pointers" is heavily discouraged in Rust, and you cannot use them meaningfully outside of unsafe code. In terms of syntax, a pointer to T is *const T or *mut T, depending on whether it's a pointer to constant or mutable data.
Function pointers are just written as fn (Args...) -> Result. So a function that takes nothing and returns a double would be fn () -> f64.

Related

Assert that a pointer is aligned to some value

Is there a guaranteed way to assert that a given raw pointer is aligned to some alignment value?
I looked at pointer's aligned_offset function, but the docs state that it is permissible for it to give false negatives (always return usize::MAX), and that correctness cannot depend on it.
I don't want to fiddle with the alignment at all, I just want to write an assertion that will panic if the pointer is unaligned. My motivation is that when using certain low-level CPU intrinsics passing a pointer not aligned to some boundary causes a CPU error, and I'd much rather get a Rust panic message pointing where the bug causing it is located than a SEGFAULT.
An example assertion (not correct according to aligned_offset docs):
#[repr(align(64))]
struct A64(u8);
#[repr(align(32))]
struct A32(u8);
#[repr(align(8))]
struct A8(u8);
fn main() {
let a64 = [A64(0)];
let a32 = [A32(0)];
let a8 = [A8(0), A8(0)];
println!("Assert for 64 should pass...");
assert_alignment(&a64);
println!("Assert for 32 should pass...");
assert_alignment(&a32);
println!("Assert for 8, one of the following should fail:");
println!("- full array");
assert_alignment(&a8);
println!("- offset by 8");
assert_alignment(&a8[1..]);
}
fn assert_alignment<T>(a: &[T]) {
let ptr = a.as_ptr();
assert_eq!(ptr.align_offset(32), 0);
}
Rust playground.
Just to satisfy my own neuroses, I went and checked the the source of ptr::align_offset.
There's a lot of careful work around edge cases (e.g. const-evaluated it always returns usize::MAX, similarly for a pointer to a zero-sized type, and it panics if alignment is not a power of 2). But the crux of the implementation, for your purposes, is here: it takes (ptr as usize) % alignment == 0 to check if it's aligned.
Edit:
This PR is adding a ptr::is_aligned_to function, which is much more readable and also safer and better reviewed than simply (ptr as usize) % alginment == 0 (though the core of it is still that logic).
There's then some more complexity to calculate the exact offset (which may not be possible), but that's not relevant for this question.
Therefore:
assert_eq!(ptr.align_offset(alignment), 0);
should be plenty for your assertion.
Incidentally, this proves that the current rust standard library cannot target anything that does not represent pointers as simple numerical addresses, otherwise this function would not work. In the unlikely situation that the rust standard library is ported to the Intel 8086 or some weird DSP that doesn't represent pointers in the expected way, this function would have to change. But really, do you care for that hypothetical that much?

What is a "fat pointer"?

I've read the term "fat pointer" in several contexts already, but I'm not sure what exactly it means and when it is used in Rust. The pointer seems to be twice as large as a normal pointer, but I don't understand why. It also seems to have something to do with trait objects.
The term "fat pointer" is used to refer to references and raw pointers to dynamically sized types (DSTs) – slices or trait objects. A fat pointer contains a pointer plus some information that makes the DST "complete" (e.g. the length).
Most commonly used types in Rust are not DSTs but have a fixed size known at compile time. These types implement the Sized trait. Even types that manage a heap buffer of dynamic size (like Vec<T>) are Sized, as the compiler knows the exact number of bytes a Vec<T> instance will take up on the stack. There are currently four different kinds of DSTs in Rust.
Slices ([T] and str)
The type [T] (for any T) is dynamically sized (so is the special "string slice" type str). That's why you usually only see it as &[T] or &mut [T], i.e. behind a reference. This reference is a so-called "fat pointer". Let's check:
dbg!(size_of::<&u32>());
dbg!(size_of::<&[u32; 2]>());
dbg!(size_of::<&[u32]>());
This prints (with some cleanup):
size_of::<&u32>() = 8
size_of::<&[u32; 2]>() = 8
size_of::<&[u32]>() = 16
So we see that a reference to a normal type like u32 is 8 bytes large, as is a reference to an array [u32; 2]. Those two types are not DSTs. But as [u32] is a DST, the reference to it is twice as large. In the case of slices, the additional data that "completes" the DST is simply the length. So one could say the representation of &[u32] is something like this:
struct SliceRef {
ptr: *const u32,
len: usize,
}
Trait objects (dyn Trait)
When using traits as trait objects (i.e. type erased, dynamically dispatched), these trait objects are DSTs. Example:
trait Animal {
fn speak(&self);
}
struct Cat;
impl Animal for Cat {
fn speak(&self) {
println!("meow");
}
}
dbg!(size_of::<&Cat>());
dbg!(size_of::<&dyn Animal>());
This prints (with some cleanup):
size_of::<&Cat>() = 8
size_of::<&dyn Animal>() = 16
Again, &Cat is only 8 bytes large because Cat is a normal type. But dyn Animal is a trait object and therefore dynamically sized. As such, &dyn Animal is 16 bytes large.
In the case of trait objects, the additional data that completes the DST is a pointer to the vtable (the vptr). I cannot fully explain the concept of vtables and vptrs here, but they are used to call the correct method implementation in this virtual dispatch context. The vtable is a static piece of data that basically only contains a function pointer for each method. With that, a reference to a trait object is basically represented as:
struct TraitObjectRef {
data_ptr: *const (),
vptr: *const (),
}
(This is different from C++, where the vptr for abstract classes is stored within the object. Both approaches have advantages and disadvantages.)
Custom DSTs
It's actually possible to create your own DSTs by having a struct where the last field is a DST. This is rather rare, though. One prominent example is std::path::Path.
A reference or pointer to the custom DST is also a fat pointer. The additional data depends on the kind of DST inside the struct.
Exception: Extern types
In RFC 1861, the extern type feature was introduced. Extern types are also DSTs, but pointers to them are not fat pointers. Or more exactly, as the RFC puts it:
In Rust, pointers to DSTs carry metadata about the object being pointed to. For strings and slices this is the length of the buffer, for trait objects this is the object's vtable. For extern types the metadata is simply (). This means that a pointer to an extern type has the same size as a usize (ie. it is not a "fat pointer").
But if you are not interacting with a C interface, you probably won't ever have to deal with these extern types.
Above, we've seen the sizes for immutable references. Fat pointers work the same for mutable references, immutable raw pointers and mutable raw pointers:
size_of::<&[u32]>() = 16
size_of::<&mut [u32]>() = 16
size_of::<*const [u32]>() = 16
size_of::<*mut [u32]>() = 16

In (Free) Pascal, can a function return a value that can be modified without dereference?

In Pascal, I understand that one could create a function returning a pointer which can be dereferenced and then assign a value to that, such as in the following (obnoxiously useless) example:
type ptr = ^integer;
var d: integer;
function f(x: integer): ptr;
begin
f := #x;
end;
begin
f(d)^ := 4;
end.
And now d is 4.
(The actual usage is to access part of a quite complicated array of records data structure. I know that a class would be better than an array of nested records, but it isn't my code (it's TeX: The Program) and was written before Pascal implementations supported object-orientation. The code was written using essentially a language built on top of Pascal that added macros which expand before the compiler sees them. Thus you could define some macro m that takes an argument x and expands into thearray[x + 1].f1.f2 instead of writing that every time; the usage would be m(x) := somevalue. I want to replicate this functionality with a function instead of a macro.)
However, is it possible to achieve this functionality without the ^ operator? Can a function f be written such that f(x) := y (no caret) assigns the value y to x? I know that this is stupid and the answer is probably no, but I just (a) don't really like the look of it and (b) am trying to mimic exactly the form of the macro I mentioned above.
References are not first class objects in Pascal, unlike languages such as C++ or D. So the simple answer is that you cannot directly achieve what you want.
Using a pointer as you illustrated is one way to achieve the same effect although in real code you'd need to return the address of an object whose lifetime extends beyond that of the function. In your code that is not the case because the argument x is only valid until the function returns.
You could use an enhanced record with operator overloading to encapsulate the pointer, and so encapsulate the pointer dereferencing code. That may be a good option, but it very much depends on your overall problem, of which we do not have sight.

Convert Vec<T> to Vec<&T> [duplicate]

I can convert Vec<String> to Vec<&str> this way:
let mut items = Vec::<&str>::new();
for item in &another_items {
items.push(item);
}
Are there better alternatives?
There are quite a few ways to do it, some have disadvantages, others simply are more readable to some people.
This dereferences s (which is of type &String) to a String "right hand side reference", which is then dereferenced through the Deref trait to a str "right hand side reference" and then turned back into a &str. This is something that is very commonly seen in the compiler, and I therefor consider it idiomatic.
let v2: Vec<&str> = v.iter().map(|s| &**s).collect();
Here the deref function of the Deref trait is passed to the map function. It's pretty neat but requires useing the trait or giving the full path.
let v3: Vec<&str> = v.iter().map(std::ops::Deref::deref).collect();
This uses coercion syntax.
let v4: Vec<&str> = v.iter().map(|s| s as &str).collect();
This takes a RangeFull slice of the String (just a slice into the entire String) and takes a reference to it. It's ugly in my opinion.
let v5: Vec<&str> = v.iter().map(|s| &s[..]).collect();
This is uses coercions to convert a &String into a &str. Can also be replaced by a s: &str expression in the future.
let v6: Vec<&str> = v.iter().map(|s| { let s: &str = s; s }).collect();
The following (thanks #huon-dbaupp) uses the AsRef trait, which solely exists to map from owned types to their respective borrowed type. There's two ways to use it, and again, prettiness of either version is entirely subjective.
let v7: Vec<&str> = v.iter().map(|s| s.as_ref()).collect();
and
let v8: Vec<&str> = v.iter().map(AsRef::as_ref).collect();
My bottom line is use the v8 solution since it most explicitly expresses what you want.
The other answers simply work. I just want to point out that if you are trying to convert the Vec<String> into a Vec<&str> only to pass it to a function taking Vec<&str> as argument, consider revising the function signature as:
fn my_func<T: AsRef<str>>(list: &[T]) { ... }
instead of:
fn my_func(list: &Vec<&str>) { ... }
As pointed out by this question: Function taking both owned and non-owned string collections. In this way both vectors simply work without the need of conversions.
All of the answers idiomatically use iterators and collecting instead of a loop, but do not explain why this is better.
In your loop, you first create an empty vector and then push into it. Rust makes no guarantees about the strategy it uses for growing factors, but I believe the current strategy is that whenever the capacity is exceeded, the vector capacity is doubled. If the original vector had a length of 20, that would be one allocation, and 5 reallocations.
Iterating from a vector produces an iterator that has a "size hint". In this case, the iterator implements ExactSizeIterator so it knows exactly how many elements it will return. map retains this and collect takes advantage of this by allocating enough space in one go for an ExactSizeIterator.
You can also manually do this with:
let mut items = Vec::<&str>::with_capacity(another_items.len());
for item in &another_items {
items.push(item);
}
Heap allocations and reallocations are probably the most expensive part of this entire thing by far; far more expensive than taking references or writing or pushing to a vector when no new heap allocation is involved. It wouldn't surprise me if pushing a thousand elements onto a vector allocated for that length in one go were faster than pushing 5 elements that required 2 reallocations and one allocation in the process.
Another unsung advantage is that using the methods with collect do not store in a mutable variable which one should not use if it's unneeded.
another_items.iter().map(|item| item.deref()).collect::<Vec<&str>>()
To use deref() you must add using use std::ops::Deref
This one uses collect:
let strs: Vec<&str> = another_items.iter().map(|s| s as &str).collect();
Here is another option:
use std::iter::FromIterator;
let v = Vec::from_iter(v.iter().map(String::as_str));
Note that String::as_str is stable since Rust 1.7.

Is Rust multi-dimensional array row major and tightly packed?

I'm writing a 3D math library for my project, I want to know is the Rust column major or row major? For example I have a 2 dimensional array as matrix and I want to serve it to a C library (like OpenGL or Vulkan), for those library this is important to have a tightly packed column major array.
Well, let's find out:
let arr: [[i8; 2]; 2] = [[1, 2], [8, 9]];
println!(
"{:?} {:?} {:?} {:?}",
&arr[0][0] as *const _,
&arr[0][1] as *const _,
&arr[1][0] as *const _,
&arr[1][1] as *const _,
);
Prints 0x7fff5584ae74 0x7fff5584ae75 0x7fff5584ae76 0x7fff5584ae77 for example. So: yes these arrays with length known to compile time are tightly packed and (considering the common definition of the terms) row major.
Note: the test above doesn't say that this always works! You can read more about this topic here.
But: usually you use heap allocated arrays since you can't know the length beforehand. For that purpose it's idiomatic to use Vec. But there are no special rules for this type, so Vec<Vec<T>> is not tightly packed! For that reason Vec<Vec<T>> is not idiomatic anymore -- you should use a simple Vec<T> and do the calculation of the index yourself.
Of course, writing the indexing calculation multiple times is not a good solution either. Instead, you should define some wrapper type which does the indexing for you. But as Sebastian Redl already mentioned: you are not the only one having this problem and there exist types exactly for this purpose already.

Resources