I am currently trying to build a huffman encoding program and am struggling with a problem I have while traversing my generated huffman tree to create a lookup table. I decided to implement said traversal with a recursive function. In the actual implementation I use the bitvec crate to save bitsequences, but for simplicitly I will use Vec<bool> in this post.
The idea I had was to save a collection of all codewords in the Vec codewords and then only save a slice out of that vector for the actual lookup table, for which I used a HashMap.
The issue is how exactly I would solve adding a 0 or a 1 for both the left and right traversal. My idea here was to save a clone of a slice of the current sequence, append a 0 to codewords, then append that clone to the end of codewords after traversing to the left so that I can push a 1 and traverse to the right. The function I came up with looks like this:
use std::collections::HashMap;
// ignore everything being public, I use getters in the real code
pub struct HufTreeNode {
pub val: u8,
pub freq: usize,
pub left: i16,
pub right: i16,
}
fn traverse_tree<'a>(
cur_index: usize,
height: i16,
codewords: &'a mut Vec<bool>,
lookup_table: &mut HashMap<u8, &'a [bool]>,
huffman_tree: &[HufTreeNode],
) {
let cur_node = &huffman_tree[cur_index];
// if the left child is -1, we reached a leaf
if cur_node.left == -1 {
// the last `height` bits in codewords
let cur_sequence = &codewords[(codewords.len() - 1 - height as usize)..];
lookup_table.insert(cur_node.val, cur_sequence);
return;
}
// save the current sequence so we can traverse to the right afterwards
let mut cur_sequence = codewords[(codewords.len() - 1 - height as usize)..].to_vec();
codewords.push(false);
traverse_tree(
cur_node.left as usize,
height + 1,
codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
lookup_table,
huffman_tree,
);
// append the previously saved current sequence
codewords.append(&mut cur_sequence); // second mutable borrow occurs here
codewords.push(true); // third mutable borrow occurs here
traverse_tree(
cur_node.right as usize,
height + 1,
codewords, // fourth mutable borrow occurs here
lookup_table,
huffman_tree,
);
}
fn main() {
// ...
}
Apparently there is an issue with lifetimes and borrowing in that snippet of code, and I kind of get what the problem is. From what I understand, when I give codewords as a parameter in the recursive call, it has to borrow the vector for as long as I save the slice in lookup_table which is obviously not possible, causing the error. How do I solve this?
This is what cargo check gives me:
error[E0499]: cannot borrow `*codewords` as mutable more than once at a time
--> untitled.rs:43:5
|
14 | fn traverse_tree<'a>(
| -- lifetime `'a` defined here
...
34 | / traverse_tree(
35 | | cur_node.left as usize,
36 | | height + 1,
37 | | codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
| | --------- first mutable borrow occurs here
38 | | lookup_table,
39 | | huffman_tree,
40 | | );
| |_____- argument requires that `*codewords` is borrowed for `'a`
...
43 | codewords.append(&mut cur_sequence); // second mutable borrow occurs here
| ^^^^^^^^^ second mutable borrow occurs here
error[E0499]: cannot borrow `*codewords` as mutable more than once at a time
--> untitled.rs:44:5
|
14 | fn traverse_tree<'a>(
| -- lifetime `'a` defined here
...
34 | / traverse_tree(
35 | | cur_node.left as usize,
36 | | height + 1,
37 | | codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
| | --------- first mutable borrow occurs here
38 | | lookup_table,
39 | | huffman_tree,
40 | | );
| |_____- argument requires that `*codewords` is borrowed for `'a`
...
44 | codewords.push(true); // third mutable borrow occurs here
| ^^^^^^^^^ second mutable borrow occurs here
error[E0499]: cannot borrow `*codewords` as mutable more than once at a time
--> untitled.rs:48:9
|
14 | fn traverse_tree<'a>(
| -- lifetime `'a` defined here
...
34 | / traverse_tree(
35 | | cur_node.left as usize,
36 | | height + 1,
37 | | codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
| | --------- first mutable borrow occurs here
38 | | lookup_table,
39 | | huffman_tree,
40 | | );
| |_____- argument requires that `*codewords` is borrowed for `'a`
...
48 | codewords, // fourth mutable borrow occurs here
| ^^^^^^^^^ second mutable borrow occurs here
What am I missing here? Is there some magical function in the vector API that I'm missing, and why exactly does this create lifetime issues in the first place? From what I can tell, all my lifetimes are correct because codewords always lives for long enough for lookup_table to save all those slices and I never mutably borrow something twice at the same time. If there was something wrong with my lifetimes, the compiler would complain inside the if cur_node.left == -1 block, and the cur_sequence I take after it is an owned Vec, so there can't be any borrowing issues with that. So the issue really is with the core idea of having a recursive function with a mutable reference as a parameter.
Is there any way for me to solve this? I tried making codewords owned and returning it, but then the compiler cannot ensure that the bitsequence I'm saving inside lookup_table lives for long enough. The only idea I still have is to save owned vectors inside lookup_table, but at that point the codewords vector is obselete in the first place and I can simply implement this by having a cur_sequence vector as parameter which I clone in every call, but I chose my approach for a better cache performance in the actual encoding process right after, which I would then lose.
The problem is that when you create a slice cur_sequence from codewords like you did in let cur_sequence = &codewords[(codewords.len() - 1 - height as usize)..];, the compiler extends the lifetime of the reference to codewords to at least the same as cur_sequence (why: The compiler wants to ensure that the slice cur_sequence is always valid, but if you change codewords (say, clear it) then it's possible that cur_sequence is invalid. By keeping an immutable reference to codewords, then borrow rules will forbid modification of codewords when the slice is still alive). And unfortunately you save cur_sequence in lookup_table, thus keeping the reference to codewords alive all over the function, so you cannot mutably borrow codewords anymore.
The solution is to maintain the indexes of the slice by yourself: create a struct:
struct Range {
start: usize,
end: usize
}
impl Range {
fn new(start: usize, end: usize) -> Self {
Range{ start, end}
}
}
then use it instead of the slices:
let cur_range = Range::new(
codewords.len() - 1 - height as usize,
codewords.len() - 1
);
lookup_table.insert(cur_node.val, cur_range);
In this way, the responsibility to keep the ranges valid is yours.
complete code:
use std::collections::HashMap;
// ignore everything being public, I use getters in the real code
pub struct HufTreeNode {
pub val: u8,
pub freq: usize,
pub left: i16,
pub right: i16,
}
struct Range {
start: usize,
end: usize
}
impl Range {
fn new(start: usize, end: usize) -> Self {
Range{ start, end}
}
}
fn traverse_tree(
cur_index: usize,
height: i16,
codewords: &mut Vec<bool>,
lookup_table: &mut HashMap<u8, Range>,
huffman_tree: &[HufTreeNode],
) {
let cur_node = &huffman_tree[cur_index];
// if the left child is -1, we reached a leaf
if cur_node.left == -1 {
// the last `height` bits in codewords
// let cur_sequence = &codewords[(codewords.len() - 1 - height as usize)..];
let cur_range = Range::new(
codewords.len() - 1 - height as usize,
codewords.len() - 1
);
lookup_table.insert(cur_node.val, cur_range);
return;
}
// save the current sequence so we can traverse to the right afterwards
let mut cur_sequence = codewords[(codewords.len() - 1 - height as usize)..].to_vec();
codewords.push(false);
traverse_tree(
cur_node.left as usize,
height + 1,
codewords, // mutable borrow - argument requires that `*codewords` is borrowed for `'a`
lookup_table,
huffman_tree,
);
// append the previously saved current sequence
codewords.append(&mut cur_sequence); // second mutable borrow occurs here
codewords.push(true); // third mutable borrow occurs here
traverse_tree(
cur_node.right as usize,
height + 1,
codewords, // fourth mutable borrow occurs here
lookup_table,
huffman_tree,
);
}
fn main() {
// ...
}
Related
There are a lot of answers for the questions about Rust's error[E0502], but I can't really understand one particular case. I have a struct and it's impl method that goes like this:
struct Test {
test_vec: Vec<i32>,
}
impl Test {
// other methods...
fn test(&mut self) -> i32 {
self.test_vec.swap(0, self.test_vec.len() - 1);
// other operations...
}
}
Trying to compile that immediately results in error:
error[E0502]: cannot borrow self.test_vec as immutable because it is also borrowed as mutable
self.test_vec.swap(0, self.test_vec.len() - 1);
------------- ---- ^^^^^^^^^^^^^ immutable borrow occurs here
| |
| mutable borrow later used by call
mutable borrow occurs here
Can anyone please explain why? It doesn't really look like I'm trying to borrow self.test_vec there, I'm passing the usize type result of a len() call. On the other hand:
fn test(&mut self) -> i32 {
let last_index = self.test_vec.len() - 1;
self.test_vec.swap(0, last_index);
// other operations...
}
Using temporary variable, it works as expected, makes me thinking that len() call is somehow evaluated after it gets to to the swap, and thus being borrowed? Am I not seeing something because of the syntax sugar?
You have to think of this in the way the compiler does. When you write:
self.test_vec.swap(0, self.test_vec.len() - 1);
What the compiler sees:
let temp1 = &mut self.test_vec; // Mutable borrow of self.test_vec
let temp2 = &self.test_vec; // (ERROR!) Shared borrow of self.test_vec for use on getting the length
let temp3 = Vec::len(temp2) - 1;
Vec::swap(temp1, 0, temp3);
As you can see, you are borrowing self.test_vec mutably first, and then trying to get the length, which is another borrow. Since the first borrow is mutable and still in effect, the second borrow is illegal.
When you use a temporary variable, you are effectively reordering your borrows and since self.test_vec.len() terminates the borrow before the next mutable borrow, there are no conflicts.
You can argue the compiler should be able to see that your code can be correct (if interpreted with in the right way), but the compiler is clearly not smart enough yet to do so.
In this code, I take a vector, create a struct instance, and add it to the vector boxed:
trait T {}
struct X {}
impl T for X {}
fn add_inst(vec: &mut Vec<Box<T>>) -> &X {
let x = X {};
vec.push(Box::new(x));
// Ugly, unsafe hack I made
unsafe { std::mem::transmute(&**vec.last().unwrap()) }
}
Obviously, it uses mem::transmute, which makes me feel it's not the right way to do this. Is this ugly hack the only way to do it?
Additionally, while this compiles in Rust 1.32, it fails in Rust 1.34:
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/lib.rs:10:14
|
10 | unsafe { std::mem::transmute(&**vec.last().unwrap()) }
| ^^^^^^^^^^^^^^^^^^^
|
= note: source type: `&dyn T` (128 bits)
= note: target type: `&X` (64 bits)
I think that this code is safe:
fn add_inst(vec: &mut Vec<Box<dyn T>>) -> &X {
let x = X {};
let b = Box::new(x);
let ptr = &*b as *const X;
vec.push(b);
unsafe { &*ptr }
}
The trick is to save a raw pointer to *const X before converting it to a Box<dyn T>. Then you can convert it back to a reference before returning it from the function.
It is safe because a boxed value is never moved, (unless it it moved out of the Box, of course), so ptr survives the cast of b into Box<dyn T>.
Your "ugly hack" is actually completely incorrect and unsafe. You were unlucky that Rust 1.32 doesn't report the error, but thankfully Rust 1.34 does.
When you store a boxed value, you create a thin pointer. This takes up the platform-native size of an integer (e.g. 32-bit on 32-bit x86, 64-bit on 64-bit x86, etc.):
+----------+
| pointer |
| (0x1000) |
+----------+
When you store a boxed trait object, you create a fat pointer. This contains the same pointer to the data and a reference to the vtable. This pointer is two native integers in size:
+----------+----------+
| pointer | vtable |
| (0x1000) | (0xBEEF) |
+----------+----------+
By attempting to perform a transmute from the trait object to the reference, you are losing one of those pointers, but it's not defined which one. There's no guarantee which comes first: the data pointer or the vtable.
One solution would use std::raw::TraitObject, but this is unstable because the layout of fat pointers is still up in the air.
The solution I would recommend, which requires no unsafe code, is to use Any:
use std::any::Any;
trait T: Any {}
struct X {}
impl T for X {}
fn add_inst(vec: &mut Vec<Box<dyn T>>) -> &X {
let x = X {};
vec.push(Box::new(x));
let l = vec.last().unwrap();
Any::downcast_ref(l).unwrap()
}
If you couldn't / don't want to use Any, I've been told that casting a trait object pointer to a pointer to a concrete type will only keep the data pointer. Unfortunately, I cannot find an official reference for this, which means I can't fully vouch for this code, although it empirically works:
fn add_inst(vec: &mut Vec<Box<dyn T>>) -> &X {
let x = X {};
vec.push(Box::new(x));
let last: &dyn T = &**vec.last().unwrap();
// I copied this code from Stack Overflow without reading
// it and it may not actually be safe.
unsafe {
let trait_obj_ptr = last as *const dyn T;
let value_ptr = trait_obj_ptr as *const X;
&*value_ptr
}
}
See also:
Why can comparing two seemingly equal pointers with == return false?
How to get a reference to a concrete type from a trait object?
Accessing the last element of a Vec or a slice
What is the best way to store a pointer as a value of another pointer?
I have a variable ptr that is of type *mut u8. How do I store the address that the ptr points to as the value of another pointer t that is also of type *mut u8.
I am trying to do something like
*t = ptr;
I get expected u8, found *-ptr error. I understand the address ptr will be 64 bits. I want to fill up 64 bits starting from address t.
I have a variable ptr that is of type *mut u8. How do I store the address that the ptr points to as the value of another pointer t that is also of type *mut u8
Assign one pointer to another:
use std::ptr;
fn main() {
let ptr: *mut u8 = ptr::null_mut();
let t: *mut u8 = ptr;
}
ptr is a pointer and the address it points to is NULL. This value is now stored in the pointer t of the same type as ptr: t points to the address NULL.
+-----+ +-----+
| | | |
| ptr | | t |
| | | |
+--+--+ +--+--+
| |
| |
+---->NULL<----+
If you wanted to have t be a pointer to the address of another pointer, you would need to take a reference to ptr. The types also could not be the same:
use std::ptr;
fn main() {
let ptr: *mut u8 = ptr::null_mut();
let t: *const *mut u8 = &ptr;
}
+-----+ +-----+
| | | |
| t +------> ptr +----->NULL
| | | |
+-----+ +-----+
I am looking for a way to write the address that the ptr points to to a specific location so that I can get the address even when I don't have t
Raw pointers have no compiler-enforced lifetimes associated with them. If you want to keep the address of something after the value has disappeared, that's an ideal case for them — you don't have to do anything:
use std::ptr;
fn do_not_dereference_this_result() -> *const u8 {
let val: u8 = 127;
let ptr: *const u8 = &val;
ptr
}
fn main() {
println!("{:p}", do_not_dereference_this_result())
}
In rarer cases, you might want to store the address in a usize (a pointer-sized integer value):
use std::ptr;
fn do_not_dereference_this_result() -> usize {
let val: u8 = 127;
let ptr: *const u8 = &val;
ptr as usize
}
fn main() {
println!("{:x}", do_not_dereference_this_result())
}
It really sounds like you are confused by how pointers work, which is a pretty good sign that you are going to shoot yourself in the foot if you use them. I'd strongly encourage you to solely use references in any important code until your understanding of pointers has increased.
You generally don't want to do that, but I'll trust you know what you're doing.
The key here is to understand a few things:
Box<T> is roughly equivalent to a *T allocated on the heap, you can use Box::into_raw to convert it into a *T.
If you do this, you're effectively leaking the heap allocated Box<T>, because Rust no longer knows where it is, or tracks it. You must manually convert it back into a droppable object at some point, for example using Box::from_raw.
You must Box::new(...) a value to ensure it is put on the heap, otherwise your raw pointer will point into the stack, which will eventually become invalid.
Mutable aliasing (which means two &mut T pointing to the same data) causes undefined behavior. It is extremely important to understand that undefined behavior is not triggered by concurrent writes to mutable aliases... it is triggered by mutable aliases existing at the same time, in any scope.
...but, if you really want to, you'd do it like this:
let foo_ref = Box::into_raw(Box::new(10));
let foo_ref_ref = Box::into_raw(Box::new(foo_ref));
// Modify via raw pointer
unsafe {
**(foo_ref_ref as *const *mut i32) = 100;
}
// Read via raw pointer
unsafe {
println!("{:?}", **foo_ref_ref);
}
// Resolve leaked memory
unsafe {
Box::from_raw(foo_ref_ref);
Box::from_raw(foo_ref);
}
I'm trying to pass a mutable slice to a function and use it in several loops inside it.
function1 produces an error. Changing to function2 or function3 makes the errors disappear, but I don't understand the differences between function1 and function2. v and &mut *v seem similar to me.
Why doesn't function1 work while the others do?
fn main() {
let mut v = Vec::new();
function1(&mut v);
function2(&mut v);
function3(&mut v);
}
// Move Error
fn function1(v: &mut [i32]) {
for l in v {}
for l in v {} // <-- Error Here !!!
}
// Works Fine
fn function2(v: &mut [i32]) {
for l in &mut *v {}
for l in &mut *v {}
}
// Works Fine
fn function3(v: &mut [i32]) {
for l in v.iter_mut() {}
for l in v.iter_mut() {}
}
The error:
error[E0382]: use of moved value: `v`
--> src/main.rs:12:14
|
11 | for l in v {}
| - value moved here
12 | for l in v {} // <-- Error Here !!!
| ^ value used here after move
|
= note: move occurs because `v` has type `&mut [i32]`, which does not implement the `Copy` trait
&mut *v is doing a so-called "reborrow".
This means that instead of iterating over the original reference, you are iterating over a new reference.
Think about it this way:
If you have an owned vector, and you iterate over it, then you get the same error if you try iterating over it again, because it has been moved into the for loop.
If instead you borrow the vector and iterate over the borrow, then you can do that as many times as you want.
If you have a mutable borrow, and you iterate over it, then you are moving the mutable borrow into the for loop. So it's gone now.
If instead you create a new reference pointing into the mutable borrow, you are just moving out of the new reference. Once the iteration finishes, the new mutable borrow is gone, meaning that the original mutable borrow can be accessed again.
Here is an example of how to transmute a Sized type from a raw pointer:
use std::mem;
#[derive(Eq, PartialEq)]
#[repr(packed)]
struct Bob {
id: u32,
age: u32,
}
unsafe fn get_type<'a, T: Sized>(p: *const u8) -> &'a T {
mem::transmute(p)
}
#[test]
fn it_works() {
let bob = Bob {
id: 22,
age: 445,
};
let bob2: &Bob = unsafe {
let ptr: *const u8 = mem::transmute(&bob);
get_type(ptr)
};
assert_eq!(&bob, bob2);
}
However, for my application I want to be able to get a ?Sized type instead of a Sized type. However, this doesn't work:
unsafe fn get_type2<'a, T: ?Sized>(p: *const u8) -> &'a T {
mem::transmute(p)
}
It fails with this error message:
error: transmute called with differently sized types: *const u8 (64 bits) to &'a T (pointer to T) [--explain E0512]
--> src/main.rs:2:9
|>
2 |> mem::transmute(p)
|> ^^^^^^^^^^^^^^
I have tried to give it a &[u8] (fat pointer) by converting it using std::slice::from_raw_parts, but it fails with pretty much the same error message.
You actually cannot for the very reason cited in the error message.
Rust references can be either pointer-sized (for Sized types) or bigger (for !Sized types). For example, if Trait is a trait, a &Trait reference is actually two fields as defined by std::raw::TraitObject.
So, in order to form a reference to an unsized type, you have to:
identify exactly what kind of unsized type it is (trait? slice? ...)
pick the right representation (std::raw::TraitObject, std::raw::Slice, ...)
and then you have to fill in the blanks (there is more than just a pointer).
So, unless you can limit your function to producing &T where T: Sized, you cannot just transmute a raw pointer to &T.