Reading the rustonomicon, I found this implementation of a custom Vec<T>.
pub struct IntoIter<T> {
buf: NonNull<T>,
cap: usize,
start: *const T,
end: *const T,
_marker: PhantomData<T>,
}
and it's impl of IntoIterator
impl<T> IntoIterator for Vec<T> {
type Item = T;
type IntoIter = IntoIter<T>;
fn into_iter(self) -> IntoIter<T> {
// Can't destructure Vec since it's Drop
let ptr = self.ptr;
let cap = self.cap;
let len = self.len;
// Make sure not to drop Vec since that would free the buffer
mem::forget(self);
unsafe {
IntoIter {
buf: ptr,
cap: cap,
start: ptr.as_ptr(),
end: if cap == 0 {
// can't offset off this pointer, it's not allocated!
ptr.as_ptr()
} else {
ptr.as_ptr().add(len)
},
_marker: PhantomData,
}
}
}
}
I want to ask about this specific line:
// Make sure not to drop Vec since that would free the buffer
mem::forget(self);
In which case the buffer could be freed? More specificaly, in that piece of code.
Or could be released in another implementation and, when called first, clean the buff and then, with the buff freed, generate a use-after-free error?
Would be enought to just mem::forget(self.buf)?
Without the forget, the self object would be dropped when that function returns, and its drop method will be called. The drop implementation would then free the buffer, which is not what we want, because the IntoIter is using it.
Using forget, we get rid of the self object without running its drop function, letting the IntoIter object semantically take ownership of the buffer, use it, and free it when it's done using it.
I'm implementing some recursive code, where function instances deeper down in the call stack may need to refer to data from prior frames. However, I only have non-mut access to those data, so I receive those data as references. As such, I would need to keep references to those data in a stack data structure that can be accessed from the deeper instances.
To illustrate:
// I would like to implement this RefStack class properly, without per-item memory allocations
struct RefStack<T: ?Sized> {
content: Vec<&T>,
}
impl<T: ?Sized> RefStack<T> {
fn new() -> Self { Self{ content: Vec::new() } }
fn get(&self, index: usize) -> &T { self.content[index] }
fn len(&self) -> usize { self.content.len() }
fn with_element<F: FnOnce(&mut Self)>(&mut self, el: &T, f: F) {
self.content.push(el);
f(self);
self.content.pop();
}
}
// This is just an example demonstrating how I would need to use the RefStack class
fn do_recursion(n: usize, node: &LinkedListNode, st: &mut RefStack<str>) {
// get references to one or more items in the stack
// the references should be allowed to live until the end of this function, but shouldn't prevent me from calling with_element() later
let tmp: &str = st.get(rng.gen_range(0, st.len()));
// do stuff with those references (println is just an example)
println!("Item: {}", tmp);
// recurse deeper if necessary
if n > 0 {
let (head, tail): (_, &LinkedListNode) = node.get_parts();
manager.get_str(head, |s: &str| // the actual string is a local variable somewhere in the implementation details of get_str()
st.with_element(s, |st| do_recursion(n - 1, tail, st))
);
}
// do more stuff with those references (println is just an example)
println!("Item: {}", tmp);
}
fn main() {
do_recursion(100, list /* gotten from somewhere else */, &mut RefStack::new());
}
In the example above, I'm concerned about how to implement RefStack without any per-item memory allocations. The occasional allocations by the Vec is acceptable - those are few and far in between. The LinkedListNode is just an example - in practice it's some complicated graph data structure, but the same thing applies - I only have a non-mut reference to it, and the closure given to manager.get_str() only provides a non-mut str. Note that the non-mut str passed into the closure may only be constructed in the get_str() implementation, so we cannot assume that all the &str have the same lifetime.
I'm fairly certain that RefStack can't be implemented in safe Rust without copying out the str into owned Strings, so my question is how this can be done in unsafe Rust. It feels like I might be able to get a solution such that:
The unsafeness is confined to the implementation of RefStack
The reference returned by st.get() should live at least as long as the current instance of the do_recursion function (in particular, it should be able to live past the call to st.with_element(), and this is logically safe since the &T that is returned by st.get() isn't referring to any memory owned by the RefStack anyway)
How can such a struct be implemented in (unsafe) Rust?
It feels that I could just cast the element references to pointers and store them as pointers, but I will still face difficulties expressing the requirement in the second bullet point above when casting them back to references. Or is there a better way (or by any chance is such a struct implementable in safe Rust, or already in some library somewhere)?
I think storing raw pointer is the way to go. You need a PhantomData to store the lifetime and get proper covariance:
use std::marker::PhantomData;
struct RefStack<'a, T: ?Sized> {
content: Vec<*const T>,
_pd: PhantomData<&'a T>,
}
impl<'a, T: ?Sized> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),_pd: PhantomData
}
}
fn get(&self, index: usize) -> &'a T {
unsafe { &*self.content[index] }
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: FnOnce(&mut RefStack<'t, T>)>(&mut self, el: &'t T, f: F)
where 'a: 't,
{
self.content.push(el);
let mut tmp = RefStack {
content: std::mem::take(&mut self.content),
_pd: PhantomData,
};
f(&mut tmp);
self.content = tmp.content;
self.content.pop();
}
}
(Playground)
The only unsafe code is in converting the pointer back into a reference.
The tricky part is getting the with_element right. I think that the were 'a: 't is implicit, because the whole impl depends on it (but better safe than sorry).
The last problem is how to convert a RefStack<'a, T> into a RefStack<'t, T>. I'm pretty sure I could just std::transmute it. But that would an extra unsafe to pay attention to, and creating a new temporary stack one is quite trivial.
About the 't lifetime
You may think that this 't lifetime is not actually needed, but not adding it may cause subtle unsoundness, as the callback could call get() and get values with a lifetime 'a that is actually longer than the inserted value.
For example this code should not compile. With the 't it correctly fails, but without it it compiles and causes undefined behavior:
fn breaking<'a, 's, 'x>(st: &'s mut RefStack<'a, i32>, v: &'x mut Vec<&'a i32>) {
v.push(st.get(0));
}
fn main() {
let mut st = RefStack::<i32>::new();
let mut y = Vec::new();
{
let i = 42;
st.with_element(&i, |stack| breaking(stack, &mut y));
}
println!("{:?}", y);
}
About panic!.
When doing these kind of unsafe things, particularly when you are calling user code, as we are doing now in with_element, we have to consider what would happen if it panics. In the OP code, the last object will not be popped, and when the stack is unwound, any drop implementation could see the now dangling reference. My code is ok in case of panics because, if f(&mut tmp); the dangling references die in the local temporary tmp while self.content is empty.
Disclaimer: this answer originally used traits, and it was a nightmare; Francis Gagne pointed out rightly that using an Option for the tail was a much better alternative, hence the answer was much simplified.
Given the structure of your usage, with the stack in RefStack following the usage of the stack frames, you can simply put elements on the stack frames and build a stack from that.
The main advantage of such an approach is that it is entirely safe. You can review the whole code here, or follow for the blow by blow description which follows.
The key is idea is to build a so-called cons-list.
#[derive(Debug)]
struct Stack<'a, T> {
element: &'a T,
tail: Option<&'a Stack<'a, T>>,
}
impl<'a, T> Stack<'a, T> {
fn new(element: &'a T) -> Self { Stack { element, tail: None } }
fn top(&self) -> &T { self.element }
fn get(&self, index: usize) -> Option<&T> {
if index == 0 {
Some(self.element)
} else {
self.tail.and_then(|tail| tail.get(index - 1))
}
}
fn tail(&self) -> Option<&'a Stack<'a, T>> { self.tail }
fn push<'b>(&'b self, element: &'b T) -> Stack<'b, T> { Stack { element, tail: Some(self) } }
}
A simple example of usage is:
fn immediate() {
let (a, b, c) = (0, 1, 2);
let root = Stack::new(&a);
let middle = root.push(&b);
let top = middle.push(&c);
println!("{:?}", top);
}
Which just prints the stack, yielding:
Stack { element: 2, tail: Some(Stack { element: 1, tail: Some(Stack { element: 0, tail: None }) }) }
And a more elaborate recursive version:
fn recursive(n: usize) {
fn inner(n: usize, stack: &Stack<'_, i32>) {
if n == 0 {
print!("{:?}", stack);
return;
}
let element = n as i32;
let stacked = stack.push(&element);
inner(n - 1, &stacked);
}
if n == 0 {
println!("()");
return;
}
let element = n as i32;
let root = Stack::new(&element);
inner(n - 1, &root);
}
Which prints:
Stack { element: 1, tail: Some(Stack { element: 2, tail: Some(Stack { element: 3, tail: None }) }) }
The one downside is that get performance may not be so good; it has linear complexity. On the other hand, cache-wise sticking to the stack frames is pretty nice. If you mostly access the first few elements, I expect it'll be good enough.
Disclaimer: A different answer; with a different trade-off.
Compared to my other answer, this answer presents a solution that is:
unsafe: it's encapsulated, but subtle.
simpler to use.
simpler code, likely faster.
The idea is to still use the stack to bind the lifetimes of the references, yet storing all lifetimes in a single Vec for O(1) random access. So we're building a stack on the stack, but not storing the references themselves on the stack. Alright?
The full code is available here.
The stack itself is very easily defined:
struct StackRoot<T: ?Sized>(Vec<*const T>);
struct Stack<'a, T: ?Sized>{
len: usize,
stack: &'a mut Vec<*const T>,
}
impl<T: ?Sized> StackRoot<T> {
fn new() -> Self { Self(vec!()) }
fn stack(&mut self) -> Stack<'_, T> { Stack { len: 0, stack: &mut self.0 } }
}
The implementation of Stack is trickier, as always when unsafe is involved:
impl<'a, T: ?Sized> Stack<'a, T> {
fn len(&self) -> usize { self.len }
fn get(&self, index: usize) -> Option<&'a T> {
if index < self.len {
// Safety:
// - Index is bounds as per above branch.
// - Lifetime of reference is guaranteed to be at least 'a (see push).
Some(unsafe { &**self.stack.get_unchecked(index) })
} else {
None
}
}
fn push<'b>(&'b mut self, element: &'b T) -> Stack<'b, T>
where
'a: 'b
{
// Stacks could have been built and forgotten, resulting in `self.stack`
// containing references to further elements, so that the newly pushed
// element would not be at index `self.len`, as expected.
//
// Note that on top of being functionally important, it's also a safety
// requirement: `self` should never be able to access elements that are
// not guaranteed to have a lifetime longer than `'a`.
self.stack.truncate(self.len);
self.stack.push(element as *const _);
Stack { len: self.len + 1, stack: &mut *self.stack }
}
}
impl<'a, T: ?Sized> Drop for Stack<'a, T> {
fn drop(&mut self) {
self.stack.truncate(self.len);
}
}
Do note the unsafe here; the invariant is that 'a parameter is always stricter that the lifetimes of the elements pushed into the stack so far.
By refusing to access elements pushed by other members, we thus guarantee that the lifetime of the returned reference is valid.
It does require a generic definition of do_recursion, however generic lifetime parameters are erased at code generation, so there's no code bloat involved:
fn do_recursion<'a, 'b>(nodes: &[&'a str], stack: &mut Stack<'b, str>)
where
'a: 'b
{
let tmp: &str = stack.get(stack.len() - 1).expect("Not empty");
println!("{:?}", tmp);
if let [head, tail # ..] = nodes {
let mut new = stack.push(head);
do_recursion(tail, &mut new);
}
}
A simple main to show it off:
fn main() {
let nodes = ["Hello", ",", "World", "!"];
let mut root = StackRoot::new();
let mut stack = root.stack();
let mut stack = stack.push(nodes[0]);
do_recursion(&nodes[1..], &mut stack);
}
Resulting in:
"Hello"
","
"World"
"!"
Based on rodrigo's answer, I implemented this slightly simpler version:
struct RefStack<'a, T: ?Sized + 'static> {
content: Vec<&'a T>,
}
impl<'a, T: ?Sized + 'static> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),
}
}
fn get(&self, index: usize) -> &'a T {
self.content[index]
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: >(&mut self, el: &'t T, f: F)
where
F: FnOnce(&mut RefStack<'t, T>),
'a: 't,
{
let mut st = RefStack {
content: std::mem::take(&mut self.content),
};
st.content.push(el);
f(&mut st);
st.content.pop();
self.content = unsafe { std::mem::transmute(st.content) };
}
}
The only difference to rodrigo's solution is that the vector is represented as vector of references instead of pointers, so we don't need the PhantomData and the unsafe code to access an element.
When a new element is pushed to the stack in with_element(), we require that it has a shorter lifetime than the existing elements with the a': t' bound. We then create a new stack with the shorter lifetime, which is possible in safe code since we know the data the references in the vector are pointing to even lives for the longer lifetime 'a. We then push the new element with lifetime 't to the new vector, again in safe code, and only after we removed that element again we move the vector back in it's original place. This requires unsafe code since we are extending the lifetime of the references in the vector from 't to 'a this time. We know this is safe, since the vector is back to its original state, but the compiler doesn't know this.
I feel this version represents the intent better than rodrigo's almost identical version. The type of the vector always is "correct", in that it desribes that the elements are actually references, not raw pointers, and it always assigns the correct lifetime to the vector. And we use unsafe code exactly in the place where something potentially unsafe happens – when extending the lifetime of the references in the vector.
I'm filling a vector in parallel, but for this generalized question, I've only found hints and no answers.
The code below works, but I want to switch to Rng::fill instead of iterating over each chunk. It might not be possible to have multiple mutable slices inside a single Vec; I'm not sure.
extern crate rayon;
extern crate rand;
extern crate rand_xoshiro;
use rand::{Rng, SeedableRng};
use rand_xoshiro::Xoshiro256StarStar;
use rayon::prelude::*;
use std::{iter, env};
use std::sync::{Arc, Mutex};
// i16 because I was filling up my RAM for large tests...
fn gen_rand_vec(data: &mut [i16]) {
let num_threads = rayon::current_num_threads();
let mut rng = rand::thread_rng();
let mut prng = Xoshiro256StarStar::from_rng(&mut rng).unwrap();
// lazy iterator of fast, unique RNGs
// Arc and Mutex are just so it can be accessed from multiple threads
let rng_it = Arc::new(Mutex::new(iter::repeat(()).map(|()| {
let new_prng = prng.clone();
prng.jump();
new_prng
})));
// generates random numbers for each chunk in parallel
// par_chunks_mut is parallel version of chunks_mut
data.par_chunks_mut(data.len() / num_threads).for_each(|chunk| {
// I used extra braces because it might be required to unlock Mutex.
// Not sure.
let mut prng = { rng_it.lock().unwrap().next().unwrap() };
for i in chunk.iter_mut() {
*i = prng.gen_range(-1024, 1024);
}
});
}
It turns out that a ChunksMut iterator gives slices. I'm not sure how to glean that from the documentation. I figured it out by reading the source:
#[derive(Debug)]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct ChunksMut<'a, T:'a> {
v: &'a mut [T],
chunk_size: usize
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, T> Iterator for ChunksMut<'a, T> {
type Item = &'a mut [T];
#[inline]
fn next(&mut self) -> Option<&'a mut [T]> {
if self.v.is_empty() {
None
} else {
let sz = cmp::min(self.v.len(), self.chunk_size);
let tmp = mem::replace(&mut self.v, &mut []);
let (head, tail) = tmp.split_at_mut(sz);
self.v = tail;
Some(head)
}
}
I guess it's just experience; to others it must be obvious that an iterator of type ChunksMut<T> from Vec<T> returns objects of type [T]. That makes sense now. It just wasn't very clear with the intermediate struct.
pub fn chunks_mut(&mut self, chunk_size: usize) -> ChunksMut<T>
// ...
impl<'a, T> Iterator for ChunksMut<'a, T>
Reading this, it looked like the iterator returned objects of type T, the same as Vec<T>.iter(), which wouldn't make sense.
In C you can use the pointer offset to get index of an element within an array, e.g.:
index = element_pointer - &vector[0];
Given a reference to an element in an array, this should be possible in Rust too.
While Rust has the ability to get the memory address from vector elements, convert them to usize, then subtract them - is there a more convenient/idiomatic way to do this in Rust?
There isn't a simpler way. I think the reason is that it would be hard to guarantee that any operation or method that gave you that answer only allowed you to use it with the a Vec (or more likely slice) and something inside that collection; Rust wouldn't want to allow you to call it with a reference into a different vector.
More idiomatic would be to avoid needing to do it in the first place. You can't store references into Vec anywhere very permanent away from the Vec anyway due to lifetimes, so you're likely to have the index handy when you've got the reference anyway.
In particular, when iterating, for example, you'd use the enumerate to iterate over pairs (index, &item).
So, given the issues which people have brought up with the pointers and stuff; the best way, imho, to do this is:
fn index_of_unchecked<T>(slice: &[T], item: &T) -> usize {
if ::std::mem::size_of::<T>() == 0 {
return 0; // do what you will with this case
}
(item as *const _ as usize - slice.as_ptr() as usize)
/ std::mem::size_of::<T>()
}
// note: for zero sized types here, you
// return Some(0) if item as *const T == slice.as_ptr()
// and None otherwise
fn index_of<T>(slice: &[T], item: &T) -> Option<usize> {
let ptr = item as *const T;
if
slice.as_ptr() < ptr &&
slice.as_ptr().offset(slice.len()) > ptr
{
Some(index_of_unchecked(slice, item))
} else {
None
}
}
although, if you want methods:
trait IndexOfExt<T> {
fn index_of_unchecked(&self, item: &T) -> usize;
fn index_of(&self, item: &T) -> Option<usize>;
}
impl<T> IndexOfExt<T> for [T] {
fn index_of_unchecked(&self, item: &T) -> usize {
// ...
}
fn index_of(&self, item: &T) -> Option<usize> {
// ...
}
}
and then you'll be able to use this method for any type that Derefs to [T]
With the generous help of the rust community I managed to get the base of a topological data structure assembled using managed pointers. This came together rather nicely and I was pretty excited about Rust in general. Then I read this post (which seems like a reasonable plan) and it inspired me to back track and try to re-assemble it using only owned pointers if possible.
This is the working version using managed pointers:
struct Dart<T> {
alpha: ~[#mut Dart<T>],
embed: ~[#mut T],
tagged: bool
}
impl<T> Dart<T> {
pub fn new(dim: uint) -> #mut Dart<T> {
let mut dart = #mut Dart{alpha: ~[], embed: ~[], tagged: false};
dart.alpha = vec::from_elem(dim, dart);
return dart;
}
pub fn get_dim(&self) -> uint {
return self.alpha.len();
}
pub fn traverse(#mut self, invs: &[uint], f: &fn(&Dart<T>)) {
let dim = self.get_dim();
for invs.each |i| {if *i >= dim {return}}; //test bounds on invs vec
if invs.len() == 2 {
let spread:int = int::abs(invs[1] as int - invs[0] as int);
if spread == 1 { //simple loop
let mut dart = self;
let mut i = invs[0];
while !dart.tagged {
dart.tagged = true;
f(dart);
dart = dart.alpha[i];
if i == invs[0] {i = invs[1];}
else {i == invs[0];}
} }
// else if spread == 2 { // max 4 cells traversed
// }
}
else {
let mut stack = ~[self];
self.tagged = true;
while !stack.is_empty() {
let mut dart = stack.pop();
f(dart);
for invs.each |i| {
if !dart.alpha[*i].tagged {
dart.alpha[*i].tagged = true;
stack.push(dart);
} } } } } }
After a few hours of chasing lifetime errors I have come to the conclusion that this may not even be possible with owned pointers due to the cyclic nature (without tying the knot as I was warned). My feeble attempt at this is below. My question, is this structure possible to implement without resorting to managed pointers? And if not, is the code above considered reasonably "rusty"? (idiomatic rust). Thanks.
struct GMap<'self,T> {
dim: uint,
darts: ~[~Dart<'self,T>]
}
struct Dart<'self,T> {
alpha: ~[&'self mut Dart<'self, T>],
embed: ~[&'self mut T],
tagged: bool
}
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let mut dart = ~Dart{alpha: ~[], embed: ~[], tagged: false};
let dartRef: &'self mut Dart<'self, T> = dart;
dartRef.alpha = vec::from_elem(self.dim, copy dartRef);
self.darts.push(dart);
}
}
I'm pretty sure that using &mut pointers is impossible, since one can only have one such pointer in existence at a time, e.g.:
fn main() {
let mut i = 0;
let a = &mut i;
let b = &mut i;
}
and-mut.rs:4:12: 4:18 error: cannot borrow `i` as mutable more than once at at a time
and-mut.rs:4 let b = &mut i;
^~~~~~
and-mut.rs:3:12: 3:18 note: second borrow of `i` as mutable occurs here
and-mut.rs:3 let a = &mut i;
^~~~~~
error: aborting due to previous error
One could get around the borrow checker unsafely, by either storing unsafe pointer to the memory (ptr::to_mut_unsafe_ptr), or indices into the darts member of GMap. Essentially, storing a single reference to the memory (in self.darts) and all operations have to go through it.
This might look like:
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let ind = self.darts.len();
self.darts.push(~Dart{alpha: vec::from_elem(self.dim, ind), embed: ~[], tagged: false});
}
}
traverse would need to change to either be a method on GMap (e.g. fn(&mut self, node_ind: uint, invs: &[uint], f: &fn(&Dart<T>))), or at least take a GMap type.
(On an entirely different note, there is library support for external iterators, which are far more composable than the internal iterators (the ones that take a closure). So defining one of these for traverse may (or may not) make using it nicer.)