Rust, how do i correctly free heap allocated memory? - pointers

I wanted to reinvent the wheel(reference counting smart pointer) and i am not sure how to properly free the memory leaked with Box::into_raw() , once the references go to zero i don't know how to efficiently free the memory that is being pointed
I originally went with
impl<T> Drop for SafePtr<T>{
fn drop(&mut self) {
//println!("drop, {} refs", self.get_refs());
self.dec_refs();
let ref_count = self.get_refs();
if ref_count == 0usize{
unsafe{
let _ = Box::from_raw(self.ptr);
let _ = Box::from_raw(self.refs);
};
println!("Dropped all pointed values");
};
}
}
but i was wondering if ptr::drop_in_place() would work the same if not better since it won't have to make a Box just to drop it

As you can see from the documentation on into_raw just drop_in_place is not enough, to free the memory you also have to call dealloc:
use std::alloc::{dealloc, Layout};
use std::ptr;
let x = Box::new(String::from("Hello"));
let p = Box::into_raw(x);
unsafe {
ptr::drop_in_place(p);
dealloc(p as *mut u8, Layout::new::<String>());
}
For performance considerations, both methods compile to the exact same instructions, so I'd just use drop(Box::from_raw(ptr)) to save me the hussle remembering the dealloc where applicable.

Related

How to please the borrow checker when implementing a 'too large to fit in memory' datastructure?

I am implementing a simple B-tree in Rust (as hobby project).
The basic implementation works well.
As long as everything fits in memory, nodes can store direct references to their children and it is easy to traverse and use them.
However, to be able to store more data inside than fits in RAM, it should be possible to store tree nodes on disk and retrieve parts of the tree only when they are needed.
But now it becomes incredibly hard to return a reference to a part of the tree.
For instance, when implementing Node::lookup_value:
extern crate arrayvec;
use arrayvec::ArrayVec;
use std::collections::HashMap;
type NodeId = usize;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
Leaf(ArrayVec<Value, 15>),
Internal(ArrayVec<NodeId, 16>)
}
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
pub keys: ArrayVec<Key, 15>,
pub contents: NodeContents<Value>,
}
pub trait Pool<T> {
type PoolId;
/// Invariants:
/// - Ony call `lookup` with a key which you *know* exists. This should always be possible, as stuff is never 'simply' removed.
/// - Do not remove or alter the things backing the pool in the meantime (Adding more is obviously allowed!).
fn lookup(&self, key: &Self::PoolId) -> T;
}
#[derive(PartialEq, Eq, PartialOrd, Ord, Clone, Debug)]
pub struct EphemeralPoolId(usize);
// An example of a pool that keeps all nodes in memory
// Similar (more complicated) implementations can be made for pools storing nodes on disk, remotely, etc.
pub struct EphemeralPool<T> {
next_id: usize,
contents: HashMap<usize, T>,
}
impl<T: Clone> Pool<T> for EphemeralPool<T> {
type PoolId = EphemeralPoolId;
fn lookup(&self, key: &Self::PoolId) -> T {
let val = self.contents.get(&key.0).expect("Tried to look up a value in the pool which does not exist. This breaks the invariant that you should only use it with a key which you received earlier from writing to it.").clone();
val
}
}
impl<Key: Ord, Value> Node<Key, Value> {
pub fn lookup_value<'pool>(&self, key: &Key, pool: &'pool impl Pool<Node<Key, Value>, PoolId=NodeId>) -> Option<&Value> {
match &self.contents {
NodeContents::Leaf(values) => { // Simple base case. No problems here.
self.keys.binary_search(key).map(|index| &values[index]).ok()
},
NodeContents::Internal(children) => {
let index = self.keys.binary_search(key).map(|index| index + 1).unwrap_or_else(|index| index);
let child_id = &children[index];
// Here the borrow checker gets mad:
let child = pool.lookup(child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
child.lookup_value(key, pool)
}
}
}
}
Rust Playground
The borrow checker gets mad.
I understand why, but not why to solve it.
It gets mad, because NodePool::lookup returns a node by value. (And it has to do so, because it reads it from disk. In the future there may be a cache in the middle, but elements from this cache might be removed at any time, so returning references to elements in this cache is also not possible.)
However, lookup_value returns a reference to a tiny part of this value. But the value stops being around once the function returns.
In the Borrow Checker's parlance:
child.lookup_value(key, pool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function
How to solve this problem?
The simplest solution would be to change the function in general to always return by-value (Option<V>). However, these values are often large and I would like to refrain from needless copies. It is quite likely that lookup_value is called in quick succession with closely related keys. (And also, there is a more complex scenario which the borrow checker similarly complains about, where we look up a range of sequential values with an interator. The same situation applies there.)
Are there ways in which I can make sure that child lives long enough?
I have tried to pass in an extra &mut HashMap<NodeId, Arc<Node<K, V>>> argument to store all children needed in the current traversal to keep their references alive for long enough. (See this variant of the code on the Rust Playground here)
However, then you end up with a recursive function that takes (and modifies) a hashmap as well as an element in the hashmap. Rust (rightfully) blocks this from working: It's (a) not possible to convince Rust that you won't overwrite earlier values in the hashmap, and (b) it might need to be reallocated when it grows which would also invalidate the references.
So that does not work.
Are there other solutions? (Even ones possibly involving unsafe?)
Minimal Reproducible Example
Your example contained a lot of errors and undefined types. Please provide a proper minimal reproducible example next time to avoid frustration and increase the chance of getting a quality answer.
EDIT: I saw your modified question too late with the proper minimal example, I hope this one fits. If not, I'll take another look. Let me know.
Either way, I attempted to reverse-engineer what your minimal example could have been:
use std::marker::PhantomData;
type NodeId = usize;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
Leaf(Vec<Value>),
Internal(Vec<NodeId>),
}
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
pub keys: Vec<Key>,
pub contents: NodeContents<Value>,
}
struct NodePool<Key, Value> {
_p: PhantomData<(Key, Value)>,
}
impl<Key, Value> NodePool<Key, Value> {
fn lookup(&self, _child_id: NodeId) -> Node<Key, Value> {
todo!()
}
}
impl<Key, Value> Node<Key, Value>
where
Key: Ord,
{
pub fn lookup_value(&self, key: &Key, pool: &NodePool<Key, Value>) -> Option<&Value> {
match &self.contents {
NodeContents::Leaf(values) => {
// Simple base case. No problems here.
self.keys
.binary_search(key)
.map(|index| &values[index])
.ok()
}
NodeContents::Internal(children) => {
let index = self
.keys
.binary_search(key)
.map(|index| index + 1)
.unwrap_or_else(|index| index);
let child_id = &children[index];
// Here the borrow checker gets mad:
let child = pool.lookup(*child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
child.lookup_value(key, pool)
}
}
}
}
error[E0515]: cannot return reference to local variable `child`
--> src/lib.rs:50:17
|
50 | child.lookup_value(key, pool)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function
Thoughts and Potential Solutions
The main question you have to ask yourself: Who owns the data once it's loaded from disk?
This is not a simple question with a simple answer. It opens many new questions:
If you query the same key twice, should it load from disk twice? Or should it be cached and some other previously cached value should be moved to disk?
If it should be loaded from disk twice, should the entire Node that lookup returned be kept alive until the borrow is returned, or should only the Value be moved out and kept alive?
If your answer is "it should be loaded twice" and "only the Value should be kept alive", you could use an enum that can carry either a borrowed or owned Value:
enum ValueHolder<'a, Value> {
Owned(Value),
Borrowed(&'a Value),
}
impl<'a, Value> std::ops::Deref for ValueHolder<'a, Value> {
type Target = Value;
fn deref(&self) -> &Self::Target {
match self {
ValueHolder::Owned(val) => val,
ValueHolder::Borrowed(val) => val,
}
}
}
In your code, it could look like this:
use std::marker::PhantomData;
type NodeId = usize;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
Leaf(Vec<Value>),
Internal(Vec<NodeId>),
}
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
pub keys: Vec<Key>,
pub contents: NodeContents<Value>,
}
pub struct NodePool<Key, Value> {
_p: PhantomData<(Key, Value)>,
}
impl<Key, Value> NodePool<Key, Value> {
fn lookup(&self, _child_id: NodeId) -> Node<Key, Value> {
todo!()
}
}
pub enum ValueHolder<'a, Value> {
Owned(Value),
Borrowed(&'a Value),
}
impl<'a, Value> std::ops::Deref for ValueHolder<'a, Value> {
type Target = Value;
fn deref(&self) -> &Self::Target {
match self {
ValueHolder::Owned(val) => val,
ValueHolder::Borrowed(val) => val,
}
}
}
impl<Key, Value> Node<Key, Value>
where
Key: Ord,
{
pub fn lookup_value(
&self,
key: &Key,
pool: &NodePool<Key, Value>,
) -> Option<ValueHolder<Value>> {
match &self.contents {
NodeContents::Leaf(values) => {
// Simple base case. No problems here.
self.keys
.binary_search(key)
.map(|index| ValueHolder::Borrowed(&values[index]))
.ok()
}
NodeContents::Internal(children) => {
let index = self
.keys
.binary_search(key)
.map(|index| index + 1)
.unwrap_or_else(|index| index);
let child_id = &children[index];
// Here the borrow checker gets mad:
let child = pool.lookup(*child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
child
.into_lookup_value(key, pool)
.map(|val| ValueHolder::Owned(val))
}
}
}
pub fn into_lookup_value(self, key: &Key, pool: &NodePool<Key, Value>) -> Option<Value> {
match self.contents {
NodeContents::Leaf(mut values) => {
// Simple base case. No problems here.
self.keys
.binary_search(key)
.map(|index| values.swap_remove(index))
.ok()
}
NodeContents::Internal(children) => {
let index = self
.keys
.binary_search(key)
.map(|index| index + 1)
.unwrap_or_else(|index| index);
let child_id = &children[index];
// Here the borrow checker gets mad:
let child = pool.lookup(*child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
child.into_lookup_value(key, pool)
}
}
}
}
In all other cases, however, it seems like further thought needs to be put into restructuring the project.
After a lot of searching and trying, and looking at what other databases are doing (great suggestion, #Finomnis!) the solution presented itself.
Use an Arena
The intuition to use a HashMap was a good one, but you have the problems with lifetimes, for two reasons:
A HashMap might be reallocated when growing, which would invalidate old references.
You cannot convince the compiler that new insertions into the Hashmap will not overwrite existing values.
However, there is an alternative datastructure which is able to provide these guarantees: Arena allocators.
With these, you essentially 'tie' the lifetimes of the values you insert into them, to the lifetime of the allocator as a whole.
Internally (at least conceptually), an arena keeps track of a list of memory regions in which data might be stored, the most recent one of which is not entirely full. Individual memory regions are never reallocated to ensure that old references remain valid; instead, when the current memory region is full, a new (larger) one is allocated and added to the list, becoming the new 'current'.
Everything is destroyed at once at the point the arena itself is deallocated.
There are two prevalent Rust libraries that provide arenas. They are very similar, with slightly different trade-offs:
typed-arena allows only a single type T per arena, but runs all Drop implementations when the arena is dropped.
bumpalo allows different objects to be stored in the same arena, but their Drop implementations do not run.
In the common situation of having only a single datatype with no special drop implementation, you might want to benchmark which one is faster in your use-case.
In the original question, we are dealing with a cache, and it makes sense to reduce copying of large datastructures by passing around Arcs between the cache and the arena. This means that you can only use typed-arena in this case, as Arc has a special Drop implementation.

Why is returning a cloned ndarray throwing an overflow error(exceeded max recursion limit)?

I'm currently trying to write a function that is generally equivalent to numpy's tile. Currently each time I try to return a (altered or unaltered) clone of the input array, I get a warning about an overflow, and cargo prompts me to increase the recursion limit. however this function isn't recursive, so I'm assuming its happening somewhere in the implementation.
here is the stripped down function, (full version):
pub fn tile<A, D1, D2>(arr: &Array<A, D1>, reps: Vec<usize>) -> Array<A, D2>
where
A: Clone,
D1: Dimension,
D2: Dimension,
{
let num_of_reps = reps.len();
//just clone the array if reps is all ones
let mut res = arr.clone();
let bail_flag = true;
for &x in reps.iter() {
if x != 1 {
bail_flag = false;
}
}
if bail_flag {
let mut res_dim = res.shape().to_owned();
_new_shape(num_of_reps, res.ndim(), &mut res_dim);
res.to_shape(res_dim);
return res;
}
...
//otherwise do extra work
...
return res.reshape(shape_out);
}
this is the actual error I'm getting on returning res:
overflow evaluating the requirement `&ArrayBase<_, _>: Neg`
consider increasing the recursion limit by adding a `#![recursion_limit = "1024"]` attribute to your crate (`mfcc_2`)
required because of the requirements on the impl of `Neg` for `&ArrayBase<_, _>`
511 redundant requirements hidden
required because of the requirements on the impl of `Neg` for `&ArrayBase<OwnedRepr<A>, D1>`rustcE0275
I looked at the implementation of Neg in ndarray, it doesn't seem to be recursive, so I'm a little confused as to what is going on.
p.s. I'm aware there are other errors in this code, as those appeared after I switched from A to f64(the actual type I plan on using the function with), but those are mostly trivial to fix. Still if you have suggestions on any error you see I appreciate them nonetheless.

How to convert a vector of vectors into a vector of slices without creating a new object? [duplicate]

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

Is there a more friendly RefCell-like object?

I'm looking for a class much like Vec<RefCell<T>>, in that it is the ultimate owner & allocator of all of its data, yet different pieces of the array can be be mutably borrowed by multiple parties indefinitely.
I emphasize indefinitely because of course pieces of Vec<T> can also be mutably borrowed by multiple parties, but doing so involves making a split which can only be resolved after the parties are done borrowing.
Vec<RefCell<T>> seems to be a world of danger and many ugly if statements checking borrow_state, which seems to be unstable. If you do something wrong, then kablammo! Panic! This is not what a lending library is like. In a lending library, if you ask for a book that isn't there, they tell you "Oh, it's checked out." Nobody dies in an explosion.
So I would like to write code something like this:
let mut a = LendingLibrary::new();
a.push(Foo{x:10});
a.push(Foo{x:11});
let b1 = a.get(0); // <-- b1 is an Option<RefMut<Foo>>
let b2 = a.get(1); // <-- b2 is an Option<RefMut<Foo>>
// the 0th element has already been borrowed, so...
let b3 = a.get(0); // <-- b3 is Option::None
Does such a thing exist? Or is there another canonical way to get this kind of behavior? A kind of "friendly RefCell"?
If the answer happens to be yes, is there also a threadsafe variant?
RefCell is not designed for long-lived borrows. The typical use case is that in a function, you'll borrow the RefCell (either mutably or immutably), work with the value, then release the borrow before returning. I'm curious to know how you're hoping to recover from a borrowed RefCell in a single-threaded context.
The thread-safe equivalent to RefCell is RwLock. It has try_read and try_write functions that do not block or panic if an incompatible lock is still acquired (on any thread, including the current thread). Contrarily to RefCell, it makes sense to just retry later if locking a RwLock fails, since another thread might just happen to have locked it at the same time.
If you end up always using write or try_write, and never read or try_read, then you should probably use the simpler Mutex instead.
#![feature(borrow_state)]
use std::cell::{RefCell, RefMut, BorrowState};
struct LendingLibrary<T> {
items: Vec<RefCell<T>>
}
impl<T> LendingLibrary<T> {
fn new(items: Vec<T>) -> LendingLibrary<T> {
LendingLibrary {
items: items.into_iter().map(|e| RefCell::new(e)).collect()
}
}
fn get(&self, item: usize) -> Option<RefMut<T>> {
self.items.get(item)
.and_then(|cell| match cell.borrow_state() {
BorrowState::Unused => Some(cell.borrow_mut()),
_ => None
})
}
}
fn main() {
let lib = LendingLibrary::new(vec![1, 2, 3]);
let a = lib.get(0); // Some
let b = lib.get(1); // Some
let a2 = lib.get(0); // None
}
This currently requires a nightly release to work.

Cannot move out of borrowed content when borrowing a generic type

I have a program that more or less looks like this
struct Test<T> {
vec: Vec<T>
}
impl<T> Test<T> {
fn get_first(&self) -> &T {
&self.vec[0]
}
fn do_something_with_x(&self, x: T) {
// Irrelevant
}
}
fn main() {
let t = Test { vec: vec![1i32, 2, 3] };
let x = t.get_first();
t.do_something_with_x(*x);
}
Basically, we call a method on the struct Test that borrows some value. Then we call another method on the same struct, passing the previously obtained value.
This example works perfectly fine. Now, when we make the content of main generic, it doesn't work anymore.
fn generic_main<T>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(*x);
}
Then I get the following error:
error: cannot move out of borrowed content
src/main.rs:14 let raw_x = *x;
I'm not completely sure why this is happening. Can someone explain to me why Test<i32> isn't borrowed when calling get_first while Test<T> is?
The short answer is that i32 implements the Copy trait, but T does not. If you use fn generic_main<T: Copy>(t: Test<T>), then your immediate problem is fixed.
The longer answer is that Copy is a special trait which means values can be copied by simply copying bits. Types like i32 implement Copy. Types like String do not implement Copy because, for example, it requires a heap allocation. If you copied a String just by copying bits, you'd end up with two String values pointing to the same chunk of memory. That would not be good (it's unsafe!).
Therefore, giving your T a Copy bound is quite restrictive. A less restrictive bound would be T: Clone. The Clone trait is similar to Copy (in that it copies values), but it's usually done by more than just "copying bits." For example, the String type will implement Clone by creating a new heap allocation for the underlying memory.
This requires you to change how your generic_main is written:
fn generic_main<T: Clone>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(x.clone());
}
Alternatively, if you don't want to have either the Clone or Copy bounds, then you could change your do_something_with_x method to take a reference to T rather than an owned T:
impl<T> Test<T> {
// other methods elided
fn do_something_with_x(&self, x: &T) {
// Irrelevant
}
}
And your generic_main stays mostly the same, except you don't dereference x:
fn generic_main<T>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(x);
}
You can read more about Copy in the docs. There are some nice examples, including how to implement Copy for your own types.

Resources