Why must pointers used by `offset_from` be derived from a pointer to the same object? - pointers

From the standard library:
Both pointers must be derived from a pointer to the same object. (See below for an example.)
let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
// Make ptr2_other an "alias" of ptr2, but derived from ptr1.
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
// Since ptr2_other and ptr2 are derived from pointers to different
// objects, computing their offset is undefined behavior, even though
// they point to the same address!
unsafe {
let zero = ptr2_other.offset_from(ptr2); // Undefined Behavior
}
I do not understand why this must be the case.

This has to do with a concept called "provenance" meaning "the place of origin". The Rust Unsafe Code Guidelines has a section on Pointer Provenance. Its a pretty abstract rule but it explains that its an extra bit of information that is used during compilation that helps guide what pointer transformations are well defined.
// Let's assume the two allocations here have base addresses 0x100 and 0x200.
// We write pointer provenance as `#N` where `N` is some kind of ID uniquely
// identifying the allocation.
let raw1 = Box::into_raw(Box::new(13u8));
let raw2 = Box::into_raw(Box::new(42u8));
let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize);
// These pointers now have the following values:
// raw1 points to address 0x100 and has provenance #1.
// raw2 points to address 0x200 and has provenance #2.
// raw2_wrong points to address 0x200 and has provenance #1.
// In other words, raw2 and raw2_wrong have same *address*...
assert_eq!(raw2 as usize, raw2_wrong as usize);
// ...but it would be UB to dereference raw2_wrong, as it has the wrong *provenance*:
// it points to address 0x200, which is in allocation #2, but the pointer
// has provenance #1.
The guidelines link to a good article: Pointers Are Complicated and its follow up Pointers Are Complicated II that go into more detail and coined the phrase:
Just because two pointers point to the same address, does not mean they are equal and can be used interchangeably.
Essentially, it is invalid to read a value via a pointer that is outside that pointer's original "allocation" even if you can guarantee a valid object exists there. Allowing such behavior could wreak havoc on the language's aliasing rules and possible optimizations. And there's pretty much never a good reason to do it.
This concept is mostly inherited from C and C++.
If you're curious if you've written code that violates this rule. Running it through miri, the undefined behavior analysis tool, can often find it.
fn main() {
let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
}
error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
--> src/main.rs:7:49
|
7 | unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
| ^^^^^^^^^^^ memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

Related

How to traverse character elements of *const char pointer in Rust?

I'm new to Rust programing and I have a bit of difficulty when this language is different from C Example, I have a C function as follows:
bool check(char* data, int size){
int i;
for(i = 0; i < size; i++){
if( data[i] != 0x00){
return false;
}
}
return true;
}
How can I convert this function to Rust? I tried it like C, but it has Errors :((
First off, I assume that you want to use as little unsafe code as possible. Otherwise there really isn't any reason to use Rust in the first place, as you forfeit all the advantages it brings you.
Depending on what data represents, there are multiple ways to transfer this to Rust.
First off: Using pointer and length as two separate arguments is not possible in Rust without unsafe. It has the same concept, though; it's called slices. A slice is exactly the same as a pointer-size combination, just that the compiler understands it and checks it for correctness at compile time.
That said, a char* in C could actually be one of four things. Each of those things map to different types in Rust:
Binary data whose deallocation is taken care of somewhere else (in Rust terms: borrowed data)
maps to &[u8], a slice. The actual content of the slice is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
Binary data that has to be deallocated within this function after using it (in Rust terms: owned data)
maps to Vec<u8>; as soon as it goes out of scope the data is deleted
actual content is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
the size of the allocation as usize. This allows for efficient push()/pop() operations. It is guaranteed that the length of the data does not exceed the size of the allocation.
A string whose deallocation is taken care of somewhere else (in Rust terms: a borrowed string)
maps to &str, a so called string slice.
This is identical to &[u8] with the additional compile time guarantee that it contains valid UTF-8 data.
A string that has to be deallocated within this function after using it (in Rust terms: an owned string)
maps to String
same as Vec<u8> with the additional compile time guarantee that it contains valid UTF-8 data.
You can create &[u8] references from Vec<u8>'s and &str references from Strings.
Now this is the point where I have to make an assumption. Because the function that you posted checks if all of the elements of data are zero, and returns false if if finds a non-zero element, I assume the content of data is binary data. And because your function does not contain a free call, I assume it is borrowed data.
With that knowledge, this is how the given function would translate to Rust:
fn check(data: &[u8]) -> bool {
for d in data {
if *d != 0x00 {
return false;
}
}
true
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
This is quite a direct translation; it's not really idiomatic to use for loops a lot in Rust. Good Rust code is mostly iterator based; iterators are most of the time zero-cost abstraction that can get compiled very efficiently.
This is how your code would look like if rewritten based on iterators:
fn check(data: &[u8]) -> bool {
data.iter().all(|el| *el == 0x00)
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
The reason this is more idiomatic is that it's a lot easier to read for someone who hasn't written it. It clearly says "return true if all elements are equal to zero". The for based code needs a second to think about to understand if its "all elements are zero", "any element is zero", "all elements are non-zero" or "any element is non-zero".
Note that both versions compile to the exact same bytecode.
Also note that, unlike the C version, the Rust borrow checker guarantees at compile time that data is valid. It's impossible in Rust (without unsafe) to produce a double free, a use-after-free, an out-of-bounds array access or any other kind of undefined behaviour that would cause memory corruption.
This is also the reason why Rust doesn't do pointers without unsafe - it needs the length of the data to check out-of-bounds errors at runtime. That means, accessing data via [] operator is a little more costly in Rust (as it does perform an out-of-bounds check every time), which is the reason why iterator based programming is a thing. Iterators can iterate over data a lot more efficient than directly accessing it via [] operators.

Why is the mutability of a variable not reflected in its type signature in Rust?

As I understand, mutability is not reflected in variables type signature. For example, these two references have the same type signature &i32:
let ref_foo : &i32 = &foo;
let mut ref_bar : &i32 = &bar;
Why is this the case? It seems like a pretty major oversight. I mean, even C/C++ does this more explictly with having two const to indicate that we have a const pointer to const data:
const int * const ptr_foo = &foo;
const int * ptr_bar = &bar;
Is there a better way of thinking about this?
Mutability is a property of a binding in Rust, not a property of the type.
The sole owner of a value can always mutate it by moving it to a mutable binding:
let s = "Hi".to_owned(); // Create an owned value.
s.push('!'); // Error because s is immutable.
let mut t = s; // Move owned value to mutable binding.
t.push('!'); // Now we can modify the string.
This shows that mutability is not a property of the type of a value, but rather of its binding. The code of course only works if the value isn't currently borrowed, which would block moving the value. A shared borrow is still guaranteed to be immutable.
Mutability of references is orthogonal to mutability of bindings. Rust uses the same mut keyword to disambiguate the two types of references, but it's a separate concept.
The interior mutability pattern is again orthogonal to the above, as it is part of the type. Types containing a Cell, RefCell or similar can be modified even when only holding a shared reference to them.
It's a common pattern to rebind a value as immutable once you are done mutating a value:
let mut x = ...;
// modify x ...
let x = x;
Ownership semantics and the type system in Rust are somewhat different than C++, and I prefer the Rust way. I don't think it's inherently less expressive, as you seem to suggest.
Constants in C++ and Rust are fundamentally different. In C++ constness is a property of any type, while in Rust it is a property of a reference. Thus, in Rust there are not true constant types.
Take for example this C++ code:
void test() {
const std::string x;
const std::string *p = &x;
const std::string &r = x;
}
Variable x is declared of constant type, so any reference created to it will be also to constant, and any attempt to modify it (with const_cast for exampe) will render undefined behavior. Note how const is part of the type of the object.
In Rust, however, there is no way to declare a constant variable:
fn test() {
let x = String::new();
let r = &x;
let mut x = x; //moved, not copied, now it is mutable!
let r = &mut x;
}
Here, the const-ness or mut-ness is not part of the type of the variable, but a property of each reference. And even the original name of the variable can be considered a reference.
Because when you declare a local variable, either in C++ or Rust, you are actually doing two things:
Creating the object itself.
Declaring a name to access the object, a reference of sorts.
When you write a C++ constant you are making both constant, the object and the reference. But in Rust there are no constant objects, so only the reference is constant. If you move the object you dispose the original name and bind to a new one, that may or may not be mutable.
Note that in C++ you cannot move a constant object, it will remain constant forever.
About having two consts for pointers, they are just the same in Rust, if you have two indirections:
fn test() {
let mut x = String::new();
let p: &mut String = &mut x;
let p2: &&mut String = &p;
}
About what is better, that is a matter of taste, but remember all the weird things that a constant can do in C++:
A constant object is always constant, except when it is not: constructors and destructors.
A constant class with mutable members is not truly constant. mutable is not part of the type system, while Rust's Cell/RefCell are.
A class with constant member is a pain to work with: default constructors and copy/move operators do not work.
In C++ everything is mutable by default and the const keyword indicates that you want to change that behavior.
In Rust everything is immutable by default, and the mut keyword indicates that you want to change that behavior.
Note that for pointers, Rust does require either the mut or const keyword:
let ref_foo : *const i32 = &foo;
let mut ref_bar : *const i32 = &bar;
Your examples are therefore equivalent, but Rust is less verbose as it defaults to immutable.
even C/C++ does this better
Years of experiences in C++ and Rust development have convinced me that Rust's way of dealing with mutability (eg. defaulting to immutable, but there are other differences) is far better.

Getting pointer by &str

Consider this pseudocode:
let k = 10;
let ptr = &k as *const k;
println!("{:p}", ptr); // prints address of pointer
let addr = format!("{:p}", ptr);
super-unsafe {
// this would obviously be super unsafe. It may even cause a STATUS_ACCESS_VIOLATION if you try getting memory from a page that the OS didn't allocate to the program!
let ptr_gen = PointerFactory::from_str(addr.as_str());
assert_eq!(k, *ptr_gen);
}
The pseudocode gets the idea across: I want to be able to get a pointer to a certain memory address by its &str representation. Is this... possible?
So essentially what you want to do is parse the string back to an integer (usize) and then interpret that value as a pointer/reference†:
fn main()
{
let i = 12i32;
let r = format!("{:p}", &i);
let x = unsafe
{
let r = r.trim_start_matches("0x");
&*(usize::from_str_radix(&r, 16).unwrap() as *const i32)
};
println!("{}", x);
}
You can try this yourself in the playground.
†As you can see, you don't even need to cast your reference into a raw pointer, the {:p} formatter takes care of representing it as a memory location (index).
Update: As E_net4 mentioned this in the comment section, it is better to use usize here, which is architecture defined unlike the machine sized one. The transmute was not necessary, so I removed it. The third point about undefined behaviour however seems obvious to whomever tries to do something like the above. This answer provides a way to achieve what the OP asked for which doesn't mean this should be used for anything else than academic/experimental purposes :)

Is there a universal Rust pointer type that can store any other kind of pointer, an analog of C's void *?

I want to create a C FFI API for my crate, but it's not clear how safe it is to cast pointers. Pseudocode:
#[no_mangle]
extern "C" fn f(...) -> *mut c_void {
let t: Box<T> = ...;
let p = Box::into_raw(t);
p as *mut c_void
}
This works as expected, but how safe is it? In C or C++, there is special void * pointer and the C++ standard declares that it is safe to cast to it. Potentially, sizeof(void *) may be not equal sizeof(T *), but there is a guarantee that sizeof(void *) >= sizeof(T *).
What about Rust? Is there any guarantee about the std::mem::size_of of a pointer or safe casting between pointers? Or do all pointers have equal size by implementation, equal to usize?
By "universal", I mean that you can convert X * without losing anything. I do not care about type information; I care about different sizes of pointers to different things, like near/far pointers in the 16-bit days.
4.10 says
The result of converting a "pointer to cv T" to a "pointer to cv void" points to the start of the storage location where the object of type T resides,
It is impossible that sizeof(void *) < sizeof(T *), because then it is impossible to have real address of storage location.
No.
Rust's raw pointers (and references) currently come in two flavors:
thin (one native-sized integer in size)
fat (two native-sized integers in size)
use std::mem;
fn main() {
println!("{}", mem::size_of::<*const u8>()); // 8
println!("{}", mem::size_of::<*const [u8]>()); // 16
}
There's no type that allows storing both; even the Big Hammer of mem::transmute won't work:
use std::mem;
unsafe fn example(mut thin: *const u8, mut fat: *const [u8]) {
fat = mem::transmute(thin);
thin = mem::transmute(fat);
}
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:4:11
|
4 | fat = mem::transmute(thin);
| ^^^^^^^^^^^^^^
|
= note: source type: `*const u8` (64 bits)
= note: target type: `*const [u8]` (128 bits)
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:5:12
|
5 | thin = mem::transmute(fat);
| ^^^^^^^^^^^^^^
|
= note: source type: `*const [u8]` (128 bits)
= note: target type: `*const u8` (64 bits)
Since the layout of fat pointers is a Rust-specific concept, they should never be accessed via FFI. This means that only thin pointers should be used, all of which have a uniform known size.
For those types, you should use an opaque pointer to provide better type safety. You could also use *const () or *const libc::c_void.
See also:
What's the Rust idiom to define a field pointing to a C opaque pointer?
Why can comparing two seemingly equal pointers with == return false?
How do I pass a closure through raw pointers as an argument to a C function?
In C or C++, there is special void * pointer and the C++ standard declares that it is safe to cast to it.
This isn't always true:
Why can't I cast a function pointer to (void *)?

Pointer to trait

When I started learning Rust, I naively assumed Rust's pointers to traits were implemented just like a C++ pointer to a base class, and wrote some code that worked even under that assumption. Specifically, the code I wrote interfaced with an FFI library that needed to read and seek a stream, and it was something like this:
struct StreamParts {
reader: *mut Read,
seeker: *mut Seek,
}
fn new_ffi_object<T: Read + Seek + 'static>(stream: T) -> FFIObject {
let stream_ptr = Box::into_raw(Box::new(stream));
let stream_parts = Box::into_raw(Box::new(StreamParts {
reader: stream_ptr as *mut Read,
seeker: stream_ptr as *mut Seek,
}));
ffi_library::new_object(stream_parts, ffi_read, ffi_seek, ffi_close)
}
extern "C" fn ffi_read(stream_parts: *mut StreamParts, ...) -> c_ulong {
(*stream_parts.reader).read(...)
...
}
extern "C" fn ffi_seek(stream_parts: *mut StreamParts, ...) -> c_ulong {
(*stream_parts.seeker).seek(...)
...
}
extern "C" fn ffi_close(stream_parts: *mut StreamParts) {
mem::drop(Box::from_raw(stream_parts.reader));
mem::drop(Box::from_raw(stream_parts));
}
And it worked. However, there are three things I don't fully understand about why it works:
Rust's trait objects are fat, containing two pointers. Thus, unlike C++, *mut Read is a pointer to a trait object, correct? And where is this trait object allocated? The Rust docs don't touch on this specific case.
Am I correct to assume that mem::drop(Box::from_raw(stream_parts.reader)) fully drops the original stream?
Why is the 'static needed in new_ffi_object()?
Pointers and references behave exactly the same, except for the borrow-checker which forbids you to have dangling references and the fact that you need to wrap pointer dereferencing into an unsafe block.
So yes, sizeof::<*mut Read>() == sizeof::<*mut ()>() * 2. The trait object isn't allocated anywhere. It's nothing more than a struct with two fields. One that is a pointer that points to your data, and one that is a pointer that points to the vtable. The vtable is allocated in the static memory.
Correct. It accesses the vtable pointer of reader and looks up the drop impl in the vtable.
If you didn't have a 'static lifetime, your T might contain references with lifetimes shorter than 'static. All that lifetime bound says is that T doesn't have such references and may thus be copied anywhere without restrictions, even on the heap.

Resources