Assert that a pointer is aligned to some value - pointers

Is there a guaranteed way to assert that a given raw pointer is aligned to some alignment value?
I looked at pointer's aligned_offset function, but the docs state that it is permissible for it to give false negatives (always return usize::MAX), and that correctness cannot depend on it.
I don't want to fiddle with the alignment at all, I just want to write an assertion that will panic if the pointer is unaligned. My motivation is that when using certain low-level CPU intrinsics passing a pointer not aligned to some boundary causes a CPU error, and I'd much rather get a Rust panic message pointing where the bug causing it is located than a SEGFAULT.
An example assertion (not correct according to aligned_offset docs):
#[repr(align(64))]
struct A64(u8);
#[repr(align(32))]
struct A32(u8);
#[repr(align(8))]
struct A8(u8);
fn main() {
let a64 = [A64(0)];
let a32 = [A32(0)];
let a8 = [A8(0), A8(0)];
println!("Assert for 64 should pass...");
assert_alignment(&a64);
println!("Assert for 32 should pass...");
assert_alignment(&a32);
println!("Assert for 8, one of the following should fail:");
println!("- full array");
assert_alignment(&a8);
println!("- offset by 8");
assert_alignment(&a8[1..]);
}
fn assert_alignment<T>(a: &[T]) {
let ptr = a.as_ptr();
assert_eq!(ptr.align_offset(32), 0);
}
Rust playground.

Just to satisfy my own neuroses, I went and checked the the source of ptr::align_offset.
There's a lot of careful work around edge cases (e.g. const-evaluated it always returns usize::MAX, similarly for a pointer to a zero-sized type, and it panics if alignment is not a power of 2). But the crux of the implementation, for your purposes, is here: it takes (ptr as usize) % alignment == 0 to check if it's aligned.
Edit:
This PR is adding a ptr::is_aligned_to function, which is much more readable and also safer and better reviewed than simply (ptr as usize) % alginment == 0 (though the core of it is still that logic).
There's then some more complexity to calculate the exact offset (which may not be possible), but that's not relevant for this question.
Therefore:
assert_eq!(ptr.align_offset(alignment), 0);
should be plenty for your assertion.
Incidentally, this proves that the current rust standard library cannot target anything that does not represent pointers as simple numerical addresses, otherwise this function would not work. In the unlikely situation that the rust standard library is ported to the Intel 8086 or some weird DSP that doesn't represent pointers in the expected way, this function would have to change. But really, do you care for that hypothetical that much?

Related

What is the correct way to convert a Vec for FFI without reallocation?

I need to pass a Vec of elements across the FFI. Experimenting, I came across a few interesting points. I started with giving the FFI all 3: ptr, len and capacity so that I could reconstruct the Vec to destroy it later:
let ptr = vec.as_mut_ptr();
let len = vec.len();
let cap = vec.capacity();
mem::forget(vec);
extern_fn(ptr, len, cap);
// ...
pub unsafe extern "C" fn free(ptr: *mut u8, len: usize, cap: usize) {
let _ = Vec::from_raw_parts(ptr, len, cap);
}
I wanted to get rid of capacity as it's useless to my frontend; it's just so that I can reconstruct my vector to free the memory.
Vec::shrink_to_fit() is tempting as it seems to eliminate the need of dealing with capacity. Unfortunately, the documentation on it does not guarantee that it'll make len == capacity, hence I assume that during from_raw_parts() will likely trigger Undefined Behavior.
into_boxed_slice() seems to have a guarantee that it's going to make len == capacity from the docs, so I used that next. Please correct me if I'm wrong. The problem is that it does not seem to guarantee no-reallocation. Here is a simple program:
fn main() {
let mut v = Vec::with_capacity(1000);
v.push(100u8);
v.push(110);
let ptr_1 = v.as_mut_ptr();
let mut boxed_slice = v.into_boxed_slice();
let ptr_2 = boxed_slice.as_mut_ptr();
let ptr_3 = Box::into_raw(boxed_slice);
println!("{:?}. {:?}. {:?}", ptr_1, ptr_2, ptr_3);
}
In the playground, It prints:
rustc 1.14.0 (e8a012324 2016-12-16)
0x7fdc9841b000. 0x7fdc98414018. 0x7fdc98414018
This is not good if it has to find new memory instead of being able to shed off extra capacity without causing a copy.
Is there any other way I can pass my vector across the FFI (to C) and not pass capacity? It seems into_boxed_slice() is what I need, but why does it involve re-allocation and copying data?
The reason is relatively simple.
Modern memory allocators will segregate allocations in "sized" slabs, where each slab is responsible for dealing with a given range of sizes. For example:
8 bytes slab: anything from 1 to 8 bytes
16 bytes slab: anything from 9 to 16 bytes
24 bytes slab: anything from 17 to 24 bytes
...
When you allocate memory, you ask for a given size, the allocator finds the right slab, gets a chunk from it, and returns your pointer.
When you deallocate memory... how do you expect the allocator to find the right slab? There are 2 solutions:
the allocator has a way to search for the slab that contains your range of memory, somehow, which involves either a linear search through the slabs or some kind of global look-up table or ...
you tell the allocator what was the size of the allocated block
It's obvious here that the C interface (free, realloc) is rather sub-par, and therefore Rust wishes to use the more efficient interface instead, the one where the onus is on the caller.
So, you have two choices:
Pass the capacity
Ensure that the length and the capacity are equal
As you realized, (2) may require a new allocation, which is quite undesirable. (1) can be implemented either by passing the capacity the whole way, or stash it at some point then retrieve it when you need it.
That's it. You have to evaluate your trade-offs.

why is indexing on the slice pointer not allowed in golang

When I run the below code, I get the compiler error saying that indexing is not supported.
txs := make([]string, 2)
txs[0] = "A"
p := &txs
fmt.Println(p[0])
I'm trying to understand why indexing on the slice pointer is not supported. I can make a copy of the pointer variable to value variable and then do indexing, but I'm curious to understand why the language is not supporting the indexing on slice pointer; it'd be so much convenient to do so. or is there a way to do it that i'm not aware? please let me know your thoughts.
Write (*p) to dereference the pointer p:
package main
import (
"fmt"
)
func main() {
txs := make([]string, 2)
txs[0] = "A"
p := &txs
fmt.Println((*p)[0])
}
Playground: https://play.golang.org/p/6Ex-3jtmw44
Output:
A
There's an abstraction happening there and the language designer chose not to apply it to the pointer. To give some practical reason, this is likely due to the fact that the pointer doesn't point to the beginning of an array (like the block of memory. If you're familiar with indexing this is generally done with something like startingAddress + index * sizeof(dataType)). So when you have the value type, it's already providing an abstraction to hide the extra layer of indirection that occurs. I assume the language authors didn't think it made sense to do this when you have a pointer to the slice object, given that points off to the actual memory that would be a pretty misleading. It already causes some confusion as is, but for a lot of developers, they probably will never realize this abstraction exists at all (like in most cases there is no noticeable difference in syntax when operating on a slice vs and array).

How can I modify a collection while also iterating over it?

I have a Board (a.k.a. &mut Vec<Vec<Cell>>) which I would like to update while iterating over it. The new value I want to update with is derived from a function which requires a &Vec<Vec<Cell>> to the collection I'm updating.
I have tried several things:
Use board.iter_mut().enumerate() and row.iter_mut().enumerate() so that I could update the cell in the innermost loop. Rust does not allow calling the next_gen function because it requires a &Vec<Vec<Cell>> and you cannot have a immutable reference when you already have a mutable reference.
Change the next_gen function signature to accept a &mut Vec<Vec<Cell>>. Rust does not allow multiple mutable references to an object.
I'm currently deferring all the updates to a HashMap and then applying them after I've performed my iteration:
fn step(board: &mut Board) {
let mut cells_to_update: HashMap<(usize, usize), Cell> = HashMap::new();
for (row_index, row) in board.iter().enumerate() {
for (column_index, cell) in row.iter().enumerate() {
let cell_next = next_gen((row_index, column_index), &board);
if *cell != cell_next {
cells_to_update.insert((row_index, column_index), cell_next);
}
}
}
println!("To Update: {:?}", cells_to_update);
for ((row_index, column_index), cell) in cells_to_update {
board[row_index][column_index] = cell;
}
}
Full source
Is there a way that I could make this code update the board "in place", that is, inside the innermost loop while still being able to call next_gen inside the innermost loop?
Disclaimer:
I'm learning Rust and I know this is not the best way to do this. I'm playing around to see what I can and cannot do. I'm also trying to limit any copying to restrict myself a little bit. As oli_obk - ker mentions, this implementation for Conway's Game of Life is flawed.
This code was intended to gauge a couple of things:
if this is even possible
if it is idiomatic Rust
From what I have gathered in the comments, it is possible with std::cell::Cell. However, using std:cell:Cell circumvents some of the core Rust principles, which I described as my "dilemma" in the original question.
Is there a way that I could make this code update the board "in place"?
There exists a type specially made for situations such as these. It's coincidentally called std::cell::Cell. You're allowed to mutate the contents of a Cell even when it has been immutably borrowed multiple times. Cell is limited to types that implement Copy (for others you have to use RefCell, and if multiple threads are involved then you must use an Arc in combination with somethinng like a Mutex).
use std::cell::Cell;
fn main() {
let board = vec![Cell::new(0), Cell::new(1), Cell::new(2)];
for a in board.iter() {
for b in board.iter() {
a.set(a.get() + b.get());
}
}
println!("{:?}", board);
}
It entirely depends on your next_gen function. Assuming we know nothing about the function except its signature, the easiest way is to use indices:
fn step(board: &mut Board) {
for row_index in 0..board.len() {
for column_index in 0..board[row_index].len() {
let cell_next = next_gen((row_index, column_index), &board);
if board[row_index][column_index] != cell_next {
board[row_index][column_index] = cell_next;
}
}
}
}
With more information about next_gen a different solution might be possible, but it sounds a lot like a cellular automaton to me, and to the best of my knowledge this cannot be done in an iterator-way in Rust without changing the type of Board.
You might fear that the indexing solution will be less efficient than an iterator solution, but you should trust LLVM on this. In case your next_gen function is in another crate, you should mark it #[inline] so LLVM can optimize it too (not necessary if everything is in one crate).
Not an answer to your question, but to your problem:
Since you are implementing Conway's Game of Life, you cannot do the modification in-place. Imagine the following pattern:
00000
00100
00100
00100
00000
If you update line 2, it will change the 1 in that line to a 0 since it has only two 1s in its neighborhood. This will cause the middle 1 to see only two 1s instead of the three that were there to begin with. Therefor you always need to either make a copy of the entire Board, or, as you did in your code, write all the changes to some other location, and splice them in after going through the entire board.

Explaining C declarations in Rust

I need to rewrite these C declarations in Go and Rust for a set of practice problems I am working on. I figured out the Go part, but I am having trouble with the Rust part. Any ideas or help to write these in Rust?
double *a[n];
double (*b)[n];
double (*c[n])();
double (*d())[n];
Assuming n is a constant:
let a: [*mut f64, ..n]; // double *a[n];
let b: *mut [f64, ..n]; // double (*b)[n];
let c: [fn() -> f64, ..n]; // double (*c[n])();
fn d() -> *mut [f64, ..n]; // double (*d())[n];
These are rather awkward and unusual types in any language. Rust's syntax, however, makes these declarations a lot easier to read than C's syntax does.
Note that d in C is a function declaration. In Rust, external function declarations are only allowed in extern blocks (see the FFI guide).
The answer depends on what, exactly, the * is for. For example, is the first one being used as an array of pointers to doubles, or is it an array of arrays of doubles? Are the pointers nullable or not?
Also, is n a constant or not? If it is, then you want an array; if it's not, you want a Vec.
Also also, are these global or local declarations? Are they function arguments? There's different syntax involved for each.
Frankly, without more context, it's impossible to answer this question with any accuracy. Instead, I will give you the following:
The Rust documentation contains all the information you'll need, although it's spread out a bit. Check the reference and any appropriate-looking guides. The FFI Guide is probably worth looking at.
cdecl is a website that will unpick C declarations if that's the part you're having difficulty with. Just note that you'll have to remove the semicolon and the n or it won't parse.
The floating point types in Rust are f32 and f64, depending on whether you're using float or double. Also, don't get caught: int in Rust is not equivalent to int in C. Prefer explicitly-sized types like i32 or u64, or types from libc like c_int. int and uint should only be used with explicitly pointer-sized values.
Normally, you'd write a reference to a T as &T or &mut T, depending on desired mutability (default in C is mutable, default in Rust is immutable).
If you want a nullable reference, use Option<&T>.
If you are trying to use these in a context where you start getting complaints about needing "lifetimes"... well, you're just going to have to learn the language. At that point, simple translation isn't going to work very well.
In Rust, array types are written as brackets around the element type. So an "array of doubles" would be [f64], an array of size n would be [f64, ..n]. Typically, however, the actual equivalent to, say, double[] in C would be &[f64]; that is, a reference to an array, rather then the actual contents of the array.
Use of "raw pointers" is heavily discouraged in Rust, and you cannot use them meaningfully outside of unsafe code. In terms of syntax, a pointer to T is *const T or *mut T, depending on whether it's a pointer to constant or mutable data.
Function pointers are just written as fn (Args...) -> Result. So a function that takes nothing and returns a double would be fn () -> f64.

Low level pointers in haskell

The foreign function interface allows haskell to work with C world. Now Haskell side allows working with the pointers using Storable instances. So for example If I have an array of integers in the C world, a plausible representation of that in the haskell world would be Ptr Int. Now suppose I want to translate the C expression a[0] = a[0] + 1. The only way to do that on the haskell side is to peek int out and then poke back the result of the addition. The problem with this approach is a temporary value is created as a result of that. (I am not sure an optimizing compiler can always avoid doing that)
Now most people might think this effect to be harmless, but think of a situation where the Pointer object contains some sensitive data. I have created this pointer on the c side in such a way that it always guaranteed that its content will never be swapped out of the memory (using mlock system call). Now peeking the result on the haskell side no more guarantees the security of the sensitive data.
So what should be the best way to avoid that in the haskell world? Has anybody else ran into similar problems with low level pointer manipulations in haskell.
I just built a test case with the code:
foo :: Ptr CInt -> IO ()
foo p = peek p >>= poke p ∘ (+1)
And using GHC 7.6.3 -fllvm -O2 -ddump-asm I see the relevant instructions:
0x0000000000000061 <+33>: mov 0x7(%r14),%rax
0x0000000000000065 <+37>: incl (%rax)
So it loads an address into rax and increments the memory at that address. Seems to be what you'd get in other languages, but let's see.
With C, I think the fair comparison is:
void foo(int *p)
{
p[0]++;
}
Which results in:
0x0000000000000000 <+0>: addl $0x1,(%rdi)
All this said, I freely admit that it is not clear to me what you are concerned about so I might have missed your point and in doing so addressed the wrong thing.

Resources