When value that is not Copy gets moved, what happens to the raw pointers that point to the moved value? (Rust) - pointers

Check TLDR at the bottom if you are not interested in me rambling about stuff I don't understand :).
I am trying to implement A-star pathfinding alg in Rust.
Just a basic version. I don't want to use Rc and my nodes store *const (raw pointer) to previous Node (prev: const* Node) so that I can retrieve the shortest path (solution).
struct Node {
position: Position,
// NonNull also works I guess
previous: Option<*const Node>
}
Also I am using PriorityQueue (crate link) where I store my nodes. My knowledge is very limited regarding memory management and low-level stuff.
Okay now I will first talk a bit what happens inside my program to better clarify what I am asking.
To my understanding, my nodes get stored/moved into PriorityQueue that is actually a HashMap.
PriorityQueue -> ... -> Vec<Bucket<K, V>> -> Bucket { hash: HashValue, key: K, value: V }
So they get stored on the heap, but the thing is that my nodes first get created on stack (I never heap allocate them using Box, or their lifetime is just bound to the stack I guess?).
Then their previous field gets assigned a raw pointer to the previous node *const Node and then they get moved/pushed in the vector that is returned from the function (now they are on the heap).
Nodes: stack -> Vec
Code of function that returns that Vec:
fn neighbours(&self, maze: &Maze) -> Vec<Node> {
let offset_x: [isize; 8] = [-1, -1, 0, 1, 1, 1, 0, -1];
let offset_y: [isize; 8] = [0, -1, -1, -1, 0, 1, 1, 1];
let mut neighbours = Vec::new();
for i in 0..8 {
let nx = self.position.x() + offset_x[i];
let ny = self.position.y() + offset_y[i];
if Node::is_valid((nx, ny), maze) {
let (nx, ny) = (nx as usize, ny as usize);
let mut node = Node::new(Position((nx, ny)), self, maze.end().unwrap());
node.previous = Some(self as *const _);
neighbours.push(node);
} else {
continue;
}
}
neighbours
}
Keep in mind that previous node is on stack when we assign raw pointer to it (*const Node) to our newly created nodes. (Also sidenote, the Previous node was originally popped from PriorityQueue and moved into a variable on a stack)
Previous: stack
After that, (nodes) from that vector they get moved into PriorityQueue we will call this queue open (or just get dropped but that is not important).
Nodes: Vec -> open: PriorityQueue
So far so good because Previous node is not yet moved and the raw pointer should still point to it.
But after those nodes get into PriorityQueue the Previous node gets moved also into the other PriorityQueue lets call that queue closed.
Previous: stack -> closed: PriorityQueue
Now my question finally is, what happens to the raw pointer that pointed to the previous nodes, is this undefined behaviour, meaning that now pointer is no longer pointing to that allocation?
TLDR:
How memory works when non Copy value gets moved (is it still on the same memory address)? Does only the ownership of the value changes and the value stays on the same memory address, or it changes memory address, gets new memory allocation and just gets copied there?
If this might be a bit too general, what happens to my Node struct then when it gets moved, will those pointers that point to previous Node that got moved be invalid?

When a node is moved, it as actually stored somewhere else, thus its address is not the same (in general, even if some optimisations could prevent that, but nothing is guaranteed).
The minimal example below shows this; the initial nodes are created on the stack, then we move them into the heap-allocated storage of a vector.
Displaying the address of all these nodes shows that these addresses are not the same indeed.
The previous member of n1, initialised with the address of n0 when it was on the stack, becomes a dangling pointer as soon as n0 is moved to the vector.
The location of n0 could have been reused for some other variables, and what happens when accessing this memory location through the pointer is actually undefined.
That's why dereferencing a raw pointer is an unsafe operation in Rust (and that's also probably why Rust exists).
Independently of the Copy trait, the difference between copying a value and moving a value is subtle.
In this case, Node is only made of basic types (an integer and a raw pointer, eventually null if None) and does not perform memory management with this pointer (as Vec does with the pointer it contains).
Thus, moving a Node happens to be physically the same thing as copying it: more or less a bitwise copy of the members of the struct as if Node was Copy.
When it comes to a Vec that we would like to move for example, the move operation would have been much more interesting.
The new location of the Vec (where it is moved-to) keeps the pointer (bitwise copy) to the heap-allocated storage as known in the initial location (where it is moved-from).
Then, this moved-from location is not considered anymore to be the owner of this heap-allocated storage: we have transferred ownership of the heap-allocated storage between the moved-from and moved-to locations, but we did not have to duplicate this heap-allocated storage.
This is the main interest of the move operation over the copy/clone (saving the duplication of heap-allocated storage by simply transferring ownership).
This situation is visible in the example below, when nodes is moved to nodes_again: the Vec itself is moved, but the Nodes it contains are not.
This section of « The Rust Programming Language » explains very well this situation, using a String instead of a Vec.
#[derive(Debug)]
struct Node {
value: i32,
previous: Option<*const Node>,
}
fn main() {
let n0 = Node {
value: 0,
previous: None,
};
let n1 = Node {
value: 1,
previous: Some(&n0 as *const _),
};
println!("n0 at {:?}: {:?}", &n0 as *const _, n0);
println!("n1 at {:?}: {:?}", &n1 as *const _, n1);
println!("~~~~~~~~");
let nodes = vec![n0, n1];
println!("nodes at {:?}", &nodes as *const _);
for (i, n) in nodes.iter().enumerate() {
println!("nodes[{}] at {:?}: {:?}", i, n as *const _, n);
}
println!("~~~~~~~~");
let nodes_again = nodes;
println!("nodes_again at {:?}", &nodes_again as *const _);
for (i, n) in nodes_again.iter().enumerate() {
println!("nodes_again[{}] at {:?}: {:?}", i, n as *const _, n);
}
}
/*
n0 at 0x7ffd2dd14520: Node { value: 0, previous: None }
n1 at 0x7ffd2dd14500: Node { value: 1, previous: Some(0x7ffd2dd14520) }
~~~~~~~~
nodes at 0x7ffd2dd144e8
nodes[0] at 0x558bad50fba0: Node { value: 0, previous: None }
nodes[1] at 0x558bad50fbb8: Node { value: 1, previous: Some(0x7ffd2dd14520) }
~~~~~~~~
nodes_again at 0x7ffd2dd14490
nodes_again[0] at 0x558bad50fba0: Node { value: 0, previous: None }
nodes_again[1] at 0x558bad50fbb8: Node { value: 1, previous: Some(0x7ffd2dd14520) }
*/

Related

How to traverse character elements of *const char pointer in Rust?

I'm new to Rust programing and I have a bit of difficulty when this language is different from C Example, I have a C function as follows:
bool check(char* data, int size){
int i;
for(i = 0; i < size; i++){
if( data[i] != 0x00){
return false;
}
}
return true;
}
How can I convert this function to Rust? I tried it like C, but it has Errors :((
First off, I assume that you want to use as little unsafe code as possible. Otherwise there really isn't any reason to use Rust in the first place, as you forfeit all the advantages it brings you.
Depending on what data represents, there are multiple ways to transfer this to Rust.
First off: Using pointer and length as two separate arguments is not possible in Rust without unsafe. It has the same concept, though; it's called slices. A slice is exactly the same as a pointer-size combination, just that the compiler understands it and checks it for correctness at compile time.
That said, a char* in C could actually be one of four things. Each of those things map to different types in Rust:
Binary data whose deallocation is taken care of somewhere else (in Rust terms: borrowed data)
maps to &[u8], a slice. The actual content of the slice is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
Binary data that has to be deallocated within this function after using it (in Rust terms: owned data)
maps to Vec<u8>; as soon as it goes out of scope the data is deleted
actual content is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
the size of the allocation as usize. This allows for efficient push()/pop() operations. It is guaranteed that the length of the data does not exceed the size of the allocation.
A string whose deallocation is taken care of somewhere else (in Rust terms: a borrowed string)
maps to &str, a so called string slice.
This is identical to &[u8] with the additional compile time guarantee that it contains valid UTF-8 data.
A string that has to be deallocated within this function after using it (in Rust terms: an owned string)
maps to String
same as Vec<u8> with the additional compile time guarantee that it contains valid UTF-8 data.
You can create &[u8] references from Vec<u8>'s and &str references from Strings.
Now this is the point where I have to make an assumption. Because the function that you posted checks if all of the elements of data are zero, and returns false if if finds a non-zero element, I assume the content of data is binary data. And because your function does not contain a free call, I assume it is borrowed data.
With that knowledge, this is how the given function would translate to Rust:
fn check(data: &[u8]) -> bool {
for d in data {
if *d != 0x00 {
return false;
}
}
true
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
This is quite a direct translation; it's not really idiomatic to use for loops a lot in Rust. Good Rust code is mostly iterator based; iterators are most of the time zero-cost abstraction that can get compiled very efficiently.
This is how your code would look like if rewritten based on iterators:
fn check(data: &[u8]) -> bool {
data.iter().all(|el| *el == 0x00)
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
The reason this is more idiomatic is that it's a lot easier to read for someone who hasn't written it. It clearly says "return true if all elements are equal to zero". The for based code needs a second to think about to understand if its "all elements are zero", "any element is zero", "all elements are non-zero" or "any element is non-zero".
Note that both versions compile to the exact same bytecode.
Also note that, unlike the C version, the Rust borrow checker guarantees at compile time that data is valid. It's impossible in Rust (without unsafe) to produce a double free, a use-after-free, an out-of-bounds array access or any other kind of undefined behaviour that would cause memory corruption.
This is also the reason why Rust doesn't do pointers without unsafe - it needs the length of the data to check out-of-bounds errors at runtime. That means, accessing data via [] operator is a little more costly in Rust (as it does perform an out-of-bounds check every time), which is the reason why iterator based programming is a thing. Iterators can iterate over data a lot more efficient than directly accessing it via [] operators.

Why must pointers used by `offset_from` be derived from a pointer to the same object?

From the standard library:
Both pointers must be derived from a pointer to the same object. (See below for an example.)
let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
// Make ptr2_other an "alias" of ptr2, but derived from ptr1.
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
// Since ptr2_other and ptr2 are derived from pointers to different
// objects, computing their offset is undefined behavior, even though
// they point to the same address!
unsafe {
let zero = ptr2_other.offset_from(ptr2); // Undefined Behavior
}
I do not understand why this must be the case.
This has to do with a concept called "provenance" meaning "the place of origin". The Rust Unsafe Code Guidelines has a section on Pointer Provenance. Its a pretty abstract rule but it explains that its an extra bit of information that is used during compilation that helps guide what pointer transformations are well defined.
// Let's assume the two allocations here have base addresses 0x100 and 0x200.
// We write pointer provenance as `#N` where `N` is some kind of ID uniquely
// identifying the allocation.
let raw1 = Box::into_raw(Box::new(13u8));
let raw2 = Box::into_raw(Box::new(42u8));
let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize);
// These pointers now have the following values:
// raw1 points to address 0x100 and has provenance #1.
// raw2 points to address 0x200 and has provenance #2.
// raw2_wrong points to address 0x200 and has provenance #1.
// In other words, raw2 and raw2_wrong have same *address*...
assert_eq!(raw2 as usize, raw2_wrong as usize);
// ...but it would be UB to dereference raw2_wrong, as it has the wrong *provenance*:
// it points to address 0x200, which is in allocation #2, but the pointer
// has provenance #1.
The guidelines link to a good article: Pointers Are Complicated and its follow up Pointers Are Complicated II that go into more detail and coined the phrase:
Just because two pointers point to the same address, does not mean they are equal and can be used interchangeably.
Essentially, it is invalid to read a value via a pointer that is outside that pointer's original "allocation" even if you can guarantee a valid object exists there. Allowing such behavior could wreak havoc on the language's aliasing rules and possible optimizations. And there's pretty much never a good reason to do it.
This concept is mostly inherited from C and C++.
If you're curious if you've written code that violates this rule. Running it through miri, the undefined behavior analysis tool, can often find it.
fn main() {
let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
}
error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
--> src/main.rs:7:49
|
7 | unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
| ^^^^^^^^^^^ memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

Getting pointer by &str

Consider this pseudocode:
let k = 10;
let ptr = &k as *const k;
println!("{:p}", ptr); // prints address of pointer
let addr = format!("{:p}", ptr);
super-unsafe {
// this would obviously be super unsafe. It may even cause a STATUS_ACCESS_VIOLATION if you try getting memory from a page that the OS didn't allocate to the program!
let ptr_gen = PointerFactory::from_str(addr.as_str());
assert_eq!(k, *ptr_gen);
}
The pseudocode gets the idea across: I want to be able to get a pointer to a certain memory address by its &str representation. Is this... possible?
So essentially what you want to do is parse the string back to an integer (usize) and then interpret that value as a pointer/reference†:
fn main()
{
let i = 12i32;
let r = format!("{:p}", &i);
let x = unsafe
{
let r = r.trim_start_matches("0x");
&*(usize::from_str_radix(&r, 16).unwrap() as *const i32)
};
println!("{}", x);
}
You can try this yourself in the playground.
†As you can see, you don't even need to cast your reference into a raw pointer, the {:p} formatter takes care of representing it as a memory location (index).
Update: As E_net4 mentioned this in the comment section, it is better to use usize here, which is architecture defined unlike the machine sized one. The transmute was not necessary, so I removed it. The third point about undefined behaviour however seems obvious to whomever tries to do something like the above. This answer provides a way to achieve what the OP asked for which doesn't mean this should be used for anything else than academic/experimental purposes :)

How does Rust know which types own resources?

When one has a box pointer to some heap-allocated memory, I assume that Rust has 'hardcoded' knowledge of ownership, so that when ownership is transferred by calling some function, the resources are moved and the argument in the function is the new owner.
However, how does this happen for vectors for example? They too 'own' their resources, and ownership mechanics apply like for box pointers -- yet they are regular values stored in variables themselves, and not pointers. How does Rust (know to) apply ownership mechanics in this situation?
Can I make my own type which owns resources?
tl;dr: "owning" types in Rust are not some magic and they are most certainly not hardcoded into the compiler or language. They are just types which written in a certain way (do not implement Copy and likely have a destructor) and have certain semantics which is enforced through non-copyability and the destructor.
In its core Rust's ownership mechanism is very simple and has very simple rules.
First of all, let's define what move is. It is simple - a value is said to be moved when it becomes available under a new name and stops being available under the old name:
struct X(u32);
let x1 = X(12);
let x2 = x1;
// x1 is no longer accessible here, trying to use it will cause a compiler error
Same thing happens when you pass a value into a function:
fn do_something(x: X) {}
let x1 = X(12);
do_something(x1);
// x1 is no longer accessible here
Note that there is absolutely no magic here - it is just that by default every value of every type behaves like in the above examples. Values of each struct or enum you or someone else creates by default will be moved.
Another important thing is that you can give every type a destructor, that is, a piece of code which is invoked when the value of this type goes out of scope and destroyed. For example, destructors associated with Vec or Box will free the corresponding piece of memory. Destructors can be declared by implementing Drop trait:
struct X(u32);
impl Drop for X {
fn drop(&mut self) {
println!("Dropping {}", x.0);
}
}
{
let x1 = X(12);
} // x1 is dropped here, and "Dropping 12" will be printed
There is a way to opt-out of non-copyability by implementing Copy trait which marks the type as automatically copyable - its values will no longer be moved but copied:
#[derive(Copy, Clone)] struct X(u32);
let x1 = X(12);
let x2 = x1;
// x1 is still available here
The copy is done bytewise - x2 will contain a byte-identical copy of x1.
Not every type can be made Copy - only those which have Copy interior and do not implement Drop. All primitive types (except &mut references but including *const and *mut raw pointers) are Copy in Rust, so each struct which contains only primitives can be made Copy. On the other hand, structs like Vec or Box are not Copy - they deliberately do not implement it because bytewise copy of them will lead to double frees because their destructors can be run twice over the same pointer.
The Copy bit above is a slight digression on my side, just to give a clearer picture. Ownership in Rust is based on move semantics. When we say that some value own something, like in "Box<T> owns the given T", we mean semantic connection between them, not something magical or something which is built into the language. It is just most such values like Vec or Box do not implement Copy and thus moved instead of copied, and they also (optionally) have a destructor which cleans up anything these types may have allocated for them (memory, sockets, files, etc.).
Given the above, of course you can write your own "owning" types. This is one of the cornerstones of idiomatic Rust, and a lot of code in the standard library and external libraries is written in such way. For example, some C APIs provide functions for creating and destroying objects. Writing an "owning" wrapper around them is very easy in Rust and it is probably very close to what you're asking for:
extern {
fn create_widget() -> *mut WidgetStruct;
fn destroy_widget(w: *mut WidgetStruct);
fn use_widget(w: *mut WidgetStruct) -> u32;
}
struct Widget(*mut WidgetStruct);
impl Drop for Widget {
fn drop(&mut self) {
unsafe { destroy_widget(self.0); }
}
}
impl Widget {
fn new() -> Widget { Widget(unsafe { create_widget() }) }
fn use_it(&mut self) -> u32 {
unsafe { use_widget(self.0) }
}
}
Now you can say that Widget owns some foreign resource represented by *mut WidgetStruct.
Here is another example of how a value might own memory and free it when the value is destroyed:
extern crate libc;
use libc::{malloc, free, c_void};
struct OwnerOfMemory {
ptr: *mut c_void
}
impl OwnerOfMemory {
fn new() -> OwnerOfMemory {
OwnerOfMemory {
ptr: unsafe { malloc(128) }
}
}
}
impl Drop for OwnerOfMemory {
fn drop(&mut self) {
unsafe { free(self.ptr); }
}
}
fn main() {
let value = OwnerOfMemory::new();
}

How to allocate space for a Vec<T> in Rust?

I want to create a Vec<T> and make some room for it, but I don't know how to do it, and, to my surprise, there is almost nothing in the official documentation about this basic type.
let mut v: Vec<i32> = Vec<i32>(SIZE); // How do I do this ?
for i in 0..SIZE {
v[i] = i;
}
I know I can create an empty Vec<T> and fill it with pushes, but I don't want to do that since I don't always know, when writing a value at index i, if a value was already inserted there yet. I don't want to write, for obvious performance reasons, something like :
if i >= len(v) {
v.push(x);
} else {
v[i] = x;
}
And, of course, I can't use the vec! syntax either.
While vec![elem; count] from the accepted answer is sufficient to create a vector with all elements equal to the same value, there are other convenience functions.
Vec::with_capacity() creates a vector with the given capacity but with zero length. It means that until this capacity is reached, push() calls won't reallocate the vector, making push() essentially free:
fn main() {
let mut v = Vec::with_capacity(10);
for i in 0..10 {
v.push(i);
}
println!("{:?}", v);
}
You can also easily collect() a vector from an iterator. Example:
fn main() {
let v: Vec<_> = (1..10).collect();
println!("{:?}", v);
}
And finally, sometimes your vector contains values of primitive type and is supposed to be used as a buffer (e.g. in network communication). In this case you can use Vec::with_capacity() + set_len() unsafe method:
fn main() {
let mut v = Vec::with_capacity(10);
unsafe { v.set_len(10); }
for i in 0..10 {
v[i] = i;
}
println!("{:?}", v);
}
Note that you have to be extra careful if your vector contains values with destructors or references - it's easy to get a destructor run over a uninitialized piece of memory or to get an invalid reference this way. It will also work right if you only use initialized part of the vector (you have to track it yourself now). To read about all the possible dangers of uninitialized memory, you can read the documentation of mem::uninitialized().
You can use the first syntax of the vec! macro, specifically vec![elem; count]. For example:
vec![1; 10]
will create a Vec<_> containing 10 1s (the type _ will be determined later or default to i32). The elem given to the macro must implement Clone. The count can be a variable, too.
There is the Vec::resize method:
fn resize(&mut self, new_len: usize, value: T)
This code resizes an empty vector to 1024 elements by filling with the value 7:
let mut vec: Vec<i32> = Vec::new();
vec.resize(1024, 7);

Resources