If I return a Vec buffer and a pointer to its internal data, is the pointer valid? - pointers

I'm writing some C FFI bindings, and I came up with a situation which I'm unsure whether it works or not. In its simplest form, it would be:
unsafe fn foo() -> (*const u8, Vec<u8>) {
let buf = vec![0, 1, 2];
(buf.as_ptr(), buf)
}
Now using it:
fn main() {
let (ptr, _buf) = foo();
// pass ptr to C function...
}
In the example above, is ptr valid, since _buf lives until the end of the scope?

The question is whether moving the Vec invalidates the pointer into it. And the answer is, it's not decided yet.
This is UCG issue #326.
So it is best to avoid code like that until it is decided. But for what it's worth, as a lot of code relies on that to work, I don't believe it will be decided to be invalid.

Related

Does the key of a HashMap have to be a pointer in Rust?

I'm admittedly new to Rust. That being said, this doesn't make sense to me yet finding out why the behavior I'm seeing isn't what I expect seems like a good learning opportunity:
use std::iter::Enumerate;
use std::collections::HashMap;
impl Solution {
pub fn two_sum(nums: Vec<i32>, target: i32) -> Vec<i32> {
let mut numToI: HashMap<i32, usize> = HashMap::new();
for (i,v) in nums.iter().enumerate() {
let num: i32 = *v;
let complement: i32 = target - num;
if numToI.contains_key(complement) {
return vec![i as i32, numToI.get(complement) as i32];
} else {
numToI.insert(complement, i);
}
}
return vec![-1,-1];
}
}
Here I'm doing the simple question twoSum. I understand that nums.iter().enumerate() will return the values i and v, which are of type usize and a pointer to the element in nums (so in this case a reference to an i32), respectively. The thing I'm having trouble with is that although I specify numToI is a HashMap<i32, usize>, not HashMap<&i32, usize>, and I dereference to get the value of v with *v and assign the value to num, when I check if the HashMap numToI contains this i32 dereferenced value as a key, I get the error: expected &i32, found i32 on the call to contains_key. Why is this? Is it because the HashMap type always requires a pointer rather than a raw value, or is it due to an intricacy of Rust I'm not aware of? Shouldn't it expect a pointer for the key instead of a i32 if I had used HashMap<&i32, i32>?
More importantly, if this is due to a difference between Rust and C that has to do with the way borrowing etc. is used in Rust, where can I learn more about the intricacies of these differences?
contains_key takes a reference. It doesn't need to take ownership of the value to test with - it just needs to look at it temporarily.
Rust is complaining that you are passing in an i32 by value instead of a reference to it. It should tell you to borrow instead: numToI.contains_key(&complement).
That's the only issue with your code, really. HashMap keys don't need to be references, and it would be really inconvenient if they did.

How to pass a Reference / Pointer to a Rust Struct to a C ffi interface?

What I am trying to do
I have built a Rust interface, with which I want to interact via C (or C# but it does not really matter for the sake of the question). Because it does not seem to be possible to make a Rust Struct accessible to C I am trying to build some wrapper functions that I can call and that will create the Struct in Rust, call functions of the struct and eventually free the Struct from memory manually.
In order to do this I thought I would pass the pointer to the Struct instance that I create in the init function back to C (or C# and temporary store it as an IntPtr). Then when I call the other functions I would pass the pointer to Rust again, dereference it and call the appropriate functions on the dereferenced Struct, mutating it in the process.
I know that I will have to use unsafe code to do this and I am fine with that. I should probably also point out, that I don't know a lot about life-time management in Rust and it might very well be, that what I am trying to is impossible, because it is quite easy to produce a loose pointer somewhere. In that case, I would wonder how I would need to adjust my approach, because I think I am not the first person who is trying to mutate some sort of state from C inside Rust.
What I tried first
So first of all I made sure to output the correct library and add my native functions to it. In the Cargo.toml I set the lib type to:
[lib]
crate-type = ["cdylib"]
Then I created some functions to interact with the struct and exposed them like this:
#[no_mangle]
pub extern fn init() -> *mut MyStruct {
let mut struct_instance = MyStruct::default();
struct_instance.init();
let raw_pointer_mut = &mut struct_instance as *mut MyStruct;
return raw_pointer_mut;
}
#[no_mangle]
pub extern fn add_item(struct_instance_ref: *mut MyStruct) {
unsafe {
let struct_instance = &mut *struct_instance_ref;
struct_instance.add_item();
}
}
As you can see in the init function I am creating the struct and then I return the (mutable) pointer.
I then take the pointer in the add_item function and use it.
Now I tried to test this implementation, because I had some doubts about the pointer still beeing valid. In another Rust module I loaded the .dll and .lib files (I am on Windows, but that should not matter for the question) and then called the functions accordingly like so:
fn main() {
unsafe {
let struct_pointer = init();
add_item(struct_pointer);
println!("The pointer adress: {:?}", struct_pointer);
}
}
#[link(name = "my_library.dll")]
extern {
fn init() -> *mut u32;
fn add_item(struct_ref: *mut u32);
}
What happened: I did get some memory adress output and (because I am actually creating a file in the real implementation) I could also see that the functions were executed as planned. However the Struct's fields seem to be not mutated. They were basically all empty, what they should not have been after I called the add_item function (and also not after I called the init function).
What I tried after that
I read a bit on life-time management in Rust and therefore tried to allocate the Struct on the heap by using a Box like so:
#[no_mangle]
pub extern fn init() -> *mut Box<MyStruct> {
let mut struct_instance = MyStruct::default();
struct_instance.init();
let raw_pointer_mut = &mut Box::new(struct_instance) as *mut Box<MyStruct>;
return raw_pointer_mut;
}
#[no_mangle]
pub extern fn add_box(struct_instance_ref: *mut Box<MyStruct>) {
unsafe {
let struct_instance = &mut *struct_instance_ref;
struct_instance.add_box();
}
}
unfortunately the result was the same as above.
Additional Information
I figured it might be good to also include how the Struct is made up in principle:
#[derive(Default)]
#[repr(C)]
pub struct MyStruct{
// Some fields...
}
impl MyStruct{
/// Initializes a new struct.
pub fn init(&mut self) {
self.some_field = whatever;
}
/// Adds an item to the struct.
pub fn add_item(
&mut self,
maybe_more_data: of_type // Obviously the call in the external function would need to be adjusted to accomodate for that...
){
some_other_function(self); // Calls another function in Rust, that will take the struct instance as an argument and mutate it.
}
}
Rust has a strong notion of ownership. Ask yourself: who owns the MyStruct instance? It's the struct_instance variable, whose lifetime is the scope of the init() function. So after init() returns, the instance is dropped and an invalid pointer is returned.
Allocating the MyStruct on the heap would be the solution, but not in the way you tried: the instance is moved to the heap, but then the Box wrapper tied to the same problematic lifetime, so it destroys the heap-allocated object.
A solution is to use Box::into_raw to take the heap-allocated value back out of the box before the box is dropped:
#[no_mangle]
pub extern fn init() -> *mut MyStruct {
let mut struct_instance = MyStruct::default();
struct_instance.init();
let box = Box::new(struct_instance);
Box::into_raw(box)
}
To destroy the value later, use Box::from_raw to create a new Box that owns it, then let that box deallocate its contained value when it goes out of scope:
#[no_mangle]
pub extern fn destroy(struct_instance: *mut MyStruct) {
unsafe { Box::from_raw(struct_instance); }
}
This seems like a common problem, so there might be a more idiomatic solution. Hopefully someone more experienced will chime in.
I'm adding a simple answer for anyone who comes across this question but doesn't need to box - &mut struct_instance as *mut _ is the correct syntax to get a mutable pointer to a struct on the stack. This syntax is a bit tricky to find documented anywhere, it's easy to miss the initial mut.
Notably, this does not solve the original poster's issue, as returning a pointer to a local is undefined behavior. However, this is the correct solution for calling something via FFI (for which there don't seem to be any better results on Google).

Getting pointer by &str

Consider this pseudocode:
let k = 10;
let ptr = &k as *const k;
println!("{:p}", ptr); // prints address of pointer
let addr = format!("{:p}", ptr);
super-unsafe {
// this would obviously be super unsafe. It may even cause a STATUS_ACCESS_VIOLATION if you try getting memory from a page that the OS didn't allocate to the program!
let ptr_gen = PointerFactory::from_str(addr.as_str());
assert_eq!(k, *ptr_gen);
}
The pseudocode gets the idea across: I want to be able to get a pointer to a certain memory address by its &str representation. Is this... possible?
So essentially what you want to do is parse the string back to an integer (usize) and then interpret that value as a pointer/reference†:
fn main()
{
let i = 12i32;
let r = format!("{:p}", &i);
let x = unsafe
{
let r = r.trim_start_matches("0x");
&*(usize::from_str_radix(&r, 16).unwrap() as *const i32)
};
println!("{}", x);
}
You can try this yourself in the playground.
†As you can see, you don't even need to cast your reference into a raw pointer, the {:p} formatter takes care of representing it as a memory location (index).
Update: As E_net4 mentioned this in the comment section, it is better to use usize here, which is architecture defined unlike the machine sized one. The transmute was not necessary, so I removed it. The third point about undefined behaviour however seems obvious to whomever tries to do something like the above. This answer provides a way to achieve what the OP asked for which doesn't mean this should be used for anything else than academic/experimental purposes :)

If I want to code in Rust securely, should I code without using pointer arithmetic?

I've read that pointer arithmetic in Rust can be done through the pointer.offset() function, but it always has to be implemented in unsafe code:
fn main() {
let buf: [u32; 5] = [1, 2, 3, 4, 5];
let mut ptr1: *const u32 = buf.as_ptr();
unsafe {
let ptr2: *const u32 = buf.as_ptr().offset(buf.len() as isize);
while ptr1 < ptr2 {
println!("Address {:?} | Value {}", ptr1, *ptr1);
ptr1 = ptr1.offset(1);
}
}
}
If I want to code in Rust securely, should I code without using pointer arithmetic and just using the corresponding index of an array for example? Or is there any other way?
If I want to code in Rust securely
Then you should not use unsafe. There are a few legit reasons for unsafe (e.g. accessing memory locations that are known and safe to use, e.g. on microcontrollers several registers), but generally you should not use it.
should I code without using pointer arithmetic and just using the corresponding index of an array for example
Yes. There is no reason (in this specific case) to use unsafe at all. Just use
for i in 0..buf.len() {
println!("Value {}", buf[i]);
}
This code however is not considered as "rusty", instead use a for-loop
for i in &buf {
println!("Value {}", i);
}
Using raw pointers like that is very unlikely[1] to be faster than an idiomatic for loop over an iterator:
fn main() {
let buf: [u32; 5] = [1, 2, 3, 4, 5];
for val in buf.iter() {
println!("Address {:?} | Value {}", val as *const u32, val);
}
}
This is also much easier to read and doesn't introduce memory unsafety risks.
1 In fact, your code compares two pointer values each iteration, so is likely to be much slower than the idiomatic for loop, which can often omit all bounds checks.

How do I make the equivalent of a C double pointer in Rust?

I started porting C code to Rust, but I'm confused about how things work in Rust. What is the equivalent of this code:
typedef struct Room {
int xPos;
int yPos;
} Room;
void main (){
Room **rooms;
rooms = malloc(sizeof(Room)*8);
}
What is the equivalent of this code
Assuming you mean "a collection of Rooms with capacity for 8":
struct Room {
x_pos: i32,
y_pos: i32,
}
fn main() {
let rooms: Vec<Room> = Vec::with_capacity(8);
}
It's exceedingly rare to call the allocator directly in Rust. Generally, you have a collection that does that for you. You also don't usually explicitly specify the item type of the collection because it can be inferred by what you put in it, but since your code doesn't use rooms at all, we have to inform the compiler.
As pointed out in the comments, you don't need a double pointer. This is the equivalent of a Room *. If you really wanted an additional level of indirection, you could add Box:
let rooms: Vec<Box<Room>> = Vec::with_capacity(8);
How do I make the equivalent of a C double pointer
One of the benefits of Rust vs C is that in C, you don't know the semantics of a foo_t **. Who should free each of the pointers? Which pointers are mutable? You can create raw pointers in Rust, but even that requires specifying mutability. This is almost never what you want:
let rooms: *mut *mut Room;
In certain FFI cases, a C function accepts a foo_t ** because it wants to modify a passed-in pointer. In those cases, something like this is reasonable:
unsafe {
let mut room: *mut Room = std::ptr::null_mut();
let room_ptr: *mut *mut Room = &mut room;
}

Resources