Dropping buffer when writing custom Vec - vector

Reading the rustonomicon, I found this implementation of a custom Vec<T>.
pub struct IntoIter<T> {
buf: NonNull<T>,
cap: usize,
start: *const T,
end: *const T,
_marker: PhantomData<T>,
}
and it's impl of IntoIterator
impl<T> IntoIterator for Vec<T> {
type Item = T;
type IntoIter = IntoIter<T>;
fn into_iter(self) -> IntoIter<T> {
// Can't destructure Vec since it's Drop
let ptr = self.ptr;
let cap = self.cap;
let len = self.len;
// Make sure not to drop Vec since that would free the buffer
mem::forget(self);
unsafe {
IntoIter {
buf: ptr,
cap: cap,
start: ptr.as_ptr(),
end: if cap == 0 {
// can't offset off this pointer, it's not allocated!
ptr.as_ptr()
} else {
ptr.as_ptr().add(len)
},
_marker: PhantomData,
}
}
}
}
I want to ask about this specific line:
// Make sure not to drop Vec since that would free the buffer
mem::forget(self);
In which case the buffer could be freed? More specificaly, in that piece of code.
Or could be released in another implementation and, when called first, clean the buff and then, with the buff freed, generate a use-after-free error?
Would be enought to just mem::forget(self.buf)?

Without the forget, the self object would be dropped when that function returns, and its drop method will be called. The drop implementation would then free the buffer, which is not what we want, because the IntoIter is using it.
Using forget, we get rid of the self object without running its drop function, letting the IntoIter object semantically take ownership of the buffer, use it, and free it when it's done using it.

Related

How does Rust model iterators? Stack or Heap?

I know that vectors in Rust are allocated on the heap where the pointer, capacity, and length of the vector are stored on the stack.
Let's say I have the following vector:
let vec = vec![1, 2, 3];
If I make an iterator from this vector:
let vec_iter = vec.iter();
How does Rust model this iterator in terms of allocation on the heap vs. stack? Is it the same as the vector?
Most iterators are stack allocated.
In cases like Vec::iter(), they create iterators that have two pointers, one to the end, one to the first element, like so
use std::marker::PhantomData;
pub struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: PhantomData<&'a T>,
}
Since pointer doesn't convey ownership or lifetime, PhantomData<&'a T> tells the compiler this struct holds reference of lifetime 'a to type T
Iter::next looks somewhat like this
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
unsafe {// pointer dereferencing is only allowed in unsafe
if self.ptr == self.end {
None
} else {
let old = self.ptr;
self.ptr = self.ptr.offset(1);
Some(&*old)
}
}
}
}
And a new Iter is created like so
impl<'a, T: 'a> Iter<'a, T> {
pub fn new(slice: &'a [T]) -> Self {
assert_ne!(std::mem::size_of::<T>(), 0); // doesn't handle zero size type
let start = slice.as_ptr();
Iter {
ptr: start,
end: unsafe { start.add(slice.len()) },
_marker: PhantomData,
}
}
}
Now we can use it like any other iterators
let v = vec!['a', 'b', 'c', 'd', 'e'];
for c in Iter::new(&v) {
println!("{c}");
}
And thanks to PhantomData, the compiler can guard us against use after free and other memory issues.
let iter = {
let v = vec!['a', 'b', 'c', 'd', 'e'];
Iter::new(&v) // error! borrowed value doesn't live long enough
};
for c in iter {
println!("{c}");
}
the pointer, capacity, and length of the vector are stored on the stack
→ not really. They are stored wherever the user wishes, which may be on the stack, in the global data segment, or on the heap:
// In the global data segment
static VEC: Vec<()> = Vec::new();
struct Foo {
v: Vec<()>,
}
fn main() {
// On the stack
let v: Vec<()> = Vec::new();
// On the heap
let f = Box::new (Foo { v: Vec::new(), });
}
And the same goes for iterators. Most of the time they simply hold references to the original data, wherever that may be, and the references themselves are stored inside the iterator struct wherever the user puts it.

Stack of references in unsafe Rust, but ensuring that the unsafeness does not leak out of the stack?

I'm implementing some recursive code, where function instances deeper down in the call stack may need to refer to data from prior frames. However, I only have non-mut access to those data, so I receive those data as references. As such, I would need to keep references to those data in a stack data structure that can be accessed from the deeper instances.
To illustrate:
// I would like to implement this RefStack class properly, without per-item memory allocations
struct RefStack<T: ?Sized> {
content: Vec<&T>,
}
impl<T: ?Sized> RefStack<T> {
fn new() -> Self { Self{ content: Vec::new() } }
fn get(&self, index: usize) -> &T { self.content[index] }
fn len(&self) -> usize { self.content.len() }
fn with_element<F: FnOnce(&mut Self)>(&mut self, el: &T, f: F) {
self.content.push(el);
f(self);
self.content.pop();
}
}
// This is just an example demonstrating how I would need to use the RefStack class
fn do_recursion(n: usize, node: &LinkedListNode, st: &mut RefStack<str>) {
// get references to one or more items in the stack
// the references should be allowed to live until the end of this function, but shouldn't prevent me from calling with_element() later
let tmp: &str = st.get(rng.gen_range(0, st.len()));
// do stuff with those references (println is just an example)
println!("Item: {}", tmp);
// recurse deeper if necessary
if n > 0 {
let (head, tail): (_, &LinkedListNode) = node.get_parts();
manager.get_str(head, |s: &str| // the actual string is a local variable somewhere in the implementation details of get_str()
st.with_element(s, |st| do_recursion(n - 1, tail, st))
);
}
// do more stuff with those references (println is just an example)
println!("Item: {}", tmp);
}
fn main() {
do_recursion(100, list /* gotten from somewhere else */, &mut RefStack::new());
}
In the example above, I'm concerned about how to implement RefStack without any per-item memory allocations. The occasional allocations by the Vec is acceptable - those are few and far in between. The LinkedListNode is just an example - in practice it's some complicated graph data structure, but the same thing applies - I only have a non-mut reference to it, and the closure given to manager.get_str() only provides a non-mut str. Note that the non-mut str passed into the closure may only be constructed in the get_str() implementation, so we cannot assume that all the &str have the same lifetime.
I'm fairly certain that RefStack can't be implemented in safe Rust without copying out the str into owned Strings, so my question is how this can be done in unsafe Rust. It feels like I might be able to get a solution such that:
The unsafeness is confined to the implementation of RefStack
The reference returned by st.get() should live at least as long as the current instance of the do_recursion function (in particular, it should be able to live past the call to st.with_element(), and this is logically safe since the &T that is returned by st.get() isn't referring to any memory owned by the RefStack anyway)
How can such a struct be implemented in (unsafe) Rust?
It feels that I could just cast the element references to pointers and store them as pointers, but I will still face difficulties expressing the requirement in the second bullet point above when casting them back to references. Or is there a better way (or by any chance is such a struct implementable in safe Rust, or already in some library somewhere)?
I think storing raw pointer is the way to go. You need a PhantomData to store the lifetime and get proper covariance:
use std::marker::PhantomData;
struct RefStack<'a, T: ?Sized> {
content: Vec<*const T>,
_pd: PhantomData<&'a T>,
}
impl<'a, T: ?Sized> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),_pd: PhantomData
}
}
fn get(&self, index: usize) -> &'a T {
unsafe { &*self.content[index] }
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: FnOnce(&mut RefStack<'t, T>)>(&mut self, el: &'t T, f: F)
where 'a: 't,
{
self.content.push(el);
let mut tmp = RefStack {
content: std::mem::take(&mut self.content),
_pd: PhantomData,
};
f(&mut tmp);
self.content = tmp.content;
self.content.pop();
}
}
(Playground)
The only unsafe code is in converting the pointer back into a reference.
The tricky part is getting the with_element right. I think that the were 'a: 't is implicit, because the whole impl depends on it (but better safe than sorry).
The last problem is how to convert a RefStack<'a, T> into a RefStack<'t, T>. I'm pretty sure I could just std::transmute it. But that would an extra unsafe to pay attention to, and creating a new temporary stack one is quite trivial.
About the 't lifetime
You may think that this 't lifetime is not actually needed, but not adding it may cause subtle unsoundness, as the callback could call get() and get values with a lifetime 'a that is actually longer than the inserted value.
For example this code should not compile. With the 't it correctly fails, but without it it compiles and causes undefined behavior:
fn breaking<'a, 's, 'x>(st: &'s mut RefStack<'a, i32>, v: &'x mut Vec<&'a i32>) {
v.push(st.get(0));
}
fn main() {
let mut st = RefStack::<i32>::new();
let mut y = Vec::new();
{
let i = 42;
st.with_element(&i, |stack| breaking(stack, &mut y));
}
println!("{:?}", y);
}
About panic!.
When doing these kind of unsafe things, particularly when you are calling user code, as we are doing now in with_element, we have to consider what would happen if it panics. In the OP code, the last object will not be popped, and when the stack is unwound, any drop implementation could see the now dangling reference. My code is ok in case of panics because, if f(&mut tmp); the dangling references die in the local temporary tmp while self.content is empty.
Disclaimer: this answer originally used traits, and it was a nightmare; Francis Gagne pointed out rightly that using an Option for the tail was a much better alternative, hence the answer was much simplified.
Given the structure of your usage, with the stack in RefStack following the usage of the stack frames, you can simply put elements on the stack frames and build a stack from that.
The main advantage of such an approach is that it is entirely safe. You can review the whole code here, or follow for the blow by blow description which follows.
The key is idea is to build a so-called cons-list.
#[derive(Debug)]
struct Stack<'a, T> {
element: &'a T,
tail: Option<&'a Stack<'a, T>>,
}
impl<'a, T> Stack<'a, T> {
fn new(element: &'a T) -> Self { Stack { element, tail: None } }
fn top(&self) -> &T { self.element }
fn get(&self, index: usize) -> Option<&T> {
if index == 0 {
Some(self.element)
} else {
self.tail.and_then(|tail| tail.get(index - 1))
}
}
fn tail(&self) -> Option<&'a Stack<'a, T>> { self.tail }
fn push<'b>(&'b self, element: &'b T) -> Stack<'b, T> { Stack { element, tail: Some(self) } }
}
A simple example of usage is:
fn immediate() {
let (a, b, c) = (0, 1, 2);
let root = Stack::new(&a);
let middle = root.push(&b);
let top = middle.push(&c);
println!("{:?}", top);
}
Which just prints the stack, yielding:
Stack { element: 2, tail: Some(Stack { element: 1, tail: Some(Stack { element: 0, tail: None }) }) }
And a more elaborate recursive version:
fn recursive(n: usize) {
fn inner(n: usize, stack: &Stack<'_, i32>) {
if n == 0 {
print!("{:?}", stack);
return;
}
let element = n as i32;
let stacked = stack.push(&element);
inner(n - 1, &stacked);
}
if n == 0 {
println!("()");
return;
}
let element = n as i32;
let root = Stack::new(&element);
inner(n - 1, &root);
}
Which prints:
Stack { element: 1, tail: Some(Stack { element: 2, tail: Some(Stack { element: 3, tail: None }) }) }
The one downside is that get performance may not be so good; it has linear complexity. On the other hand, cache-wise sticking to the stack frames is pretty nice. If you mostly access the first few elements, I expect it'll be good enough.
Disclaimer: A different answer; with a different trade-off.
Compared to my other answer, this answer presents a solution that is:
unsafe: it's encapsulated, but subtle.
simpler to use.
simpler code, likely faster.
The idea is to still use the stack to bind the lifetimes of the references, yet storing all lifetimes in a single Vec for O(1) random access. So we're building a stack on the stack, but not storing the references themselves on the stack. Alright?
The full code is available here.
The stack itself is very easily defined:
struct StackRoot<T: ?Sized>(Vec<*const T>);
struct Stack<'a, T: ?Sized>{
len: usize,
stack: &'a mut Vec<*const T>,
}
impl<T: ?Sized> StackRoot<T> {
fn new() -> Self { Self(vec!()) }
fn stack(&mut self) -> Stack<'_, T> { Stack { len: 0, stack: &mut self.0 } }
}
The implementation of Stack is trickier, as always when unsafe is involved:
impl<'a, T: ?Sized> Stack<'a, T> {
fn len(&self) -> usize { self.len }
fn get(&self, index: usize) -> Option<&'a T> {
if index < self.len {
// Safety:
// - Index is bounds as per above branch.
// - Lifetime of reference is guaranteed to be at least 'a (see push).
Some(unsafe { &**self.stack.get_unchecked(index) })
} else {
None
}
}
fn push<'b>(&'b mut self, element: &'b T) -> Stack<'b, T>
where
'a: 'b
{
// Stacks could have been built and forgotten, resulting in `self.stack`
// containing references to further elements, so that the newly pushed
// element would not be at index `self.len`, as expected.
//
// Note that on top of being functionally important, it's also a safety
// requirement: `self` should never be able to access elements that are
// not guaranteed to have a lifetime longer than `'a`.
self.stack.truncate(self.len);
self.stack.push(element as *const _);
Stack { len: self.len + 1, stack: &mut *self.stack }
}
}
impl<'a, T: ?Sized> Drop for Stack<'a, T> {
fn drop(&mut self) {
self.stack.truncate(self.len);
}
}
Do note the unsafe here; the invariant is that 'a parameter is always stricter that the lifetimes of the elements pushed into the stack so far.
By refusing to access elements pushed by other members, we thus guarantee that the lifetime of the returned reference is valid.
It does require a generic definition of do_recursion, however generic lifetime parameters are erased at code generation, so there's no code bloat involved:
fn do_recursion<'a, 'b>(nodes: &[&'a str], stack: &mut Stack<'b, str>)
where
'a: 'b
{
let tmp: &str = stack.get(stack.len() - 1).expect("Not empty");
println!("{:?}", tmp);
if let [head, tail # ..] = nodes {
let mut new = stack.push(head);
do_recursion(tail, &mut new);
}
}
A simple main to show it off:
fn main() {
let nodes = ["Hello", ",", "World", "!"];
let mut root = StackRoot::new();
let mut stack = root.stack();
let mut stack = stack.push(nodes[0]);
do_recursion(&nodes[1..], &mut stack);
}
Resulting in:
"Hello"
","
"World"
"!"
Based on rodrigo's answer, I implemented this slightly simpler version:
struct RefStack<'a, T: ?Sized + 'static> {
content: Vec<&'a T>,
}
impl<'a, T: ?Sized + 'static> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),
}
}
fn get(&self, index: usize) -> &'a T {
self.content[index]
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: >(&mut self, el: &'t T, f: F)
where
F: FnOnce(&mut RefStack<'t, T>),
'a: 't,
{
let mut st = RefStack {
content: std::mem::take(&mut self.content),
};
st.content.push(el);
f(&mut st);
st.content.pop();
self.content = unsafe { std::mem::transmute(st.content) };
}
}
The only difference to rodrigo's solution is that the vector is represented as vector of references instead of pointers, so we don't need the PhantomData and the unsafe code to access an element.
When a new element is pushed to the stack in with_element(), we require that it has a shorter lifetime than the existing elements with the a': t' bound. We then create a new stack with the shorter lifetime, which is possible in safe code since we know the data the references in the vector are pointing to even lives for the longer lifetime 'a. We then push the new element with lifetime 't to the new vector, again in safe code, and only after we removed that element again we move the vector back in it's original place. This requires unsafe code since we are extending the lifetime of the references in the vector from 't to 'a this time. We know this is safe, since the vector is back to its original state, but the compiler doesn't know this.
I feel this version represents the intent better than rodrigo's almost identical version. The type of the vector always is "correct", in that it desribes that the elements are actually references, not raw pointers, and it always assigns the correct lifetime to the vector. And we use unsafe code exactly in the place where something potentially unsafe happens – when extending the lifetime of the references in the vector.

Can I return a struct which uses PhantomData from a trait implementation to add a lifetime to a raw pointer without polluting the interface?

In this question someone commented that you could use PhantomData to add a lifetime bound to a raw pointer inside a struct. I thought I'd try doing this on an existing piece of code I've been working on.
Here's our (minimised) starting point. This compiles (playground):
extern crate libc;
use libc::{c_void, free, malloc};
trait Trace {}
struct MyTrace {
#[allow(dead_code)]
buf: *mut c_void,
}
impl MyTrace {
fn new() -> Self {
Self {
buf: unsafe { malloc(128) },
}
}
}
impl Trace for MyTrace {}
impl Drop for MyTrace {
fn drop(&mut self) {
unsafe { free(self.buf) };
}
}
trait Tracer {
fn start(&mut self);
fn stop(&mut self) -> Box<Trace>;
}
struct MyTracer {
trace: Option<MyTrace>,
}
impl MyTracer {
fn new() -> Self {
Self { trace: None }
}
}
impl Tracer for MyTracer {
fn start(&mut self) {
self.trace = Some(MyTrace::new());
// Pretend the buffer is mutated in C here...
}
fn stop(&mut self) -> Box<Trace> {
Box::new(self.trace.take().unwrap())
}
}
fn main() {
let mut tracer = MyTracer::new();
tracer.start();
let _trace = tracer.stop();
println!("Hello, world!");
}
I think that the problem with the above code is that I could in theory move the buf pointer out of a MyTrace and use if after the struct has died. In this case the underlying buffer will have been freed due to the Drop implementation.
By using a PhantomData we can ensure that only references to buf can be obtained, and that the lifetimes of those references are bound to the instances of MyTrace from whence they came.
We can proceed like this (playground):
extern crate libc;
use libc::{c_void, free, malloc};
use std::marker::PhantomData;
trait Trace {}
struct MyTrace<'b> {
#[allow(dead_code)]
buf: *mut c_void,
_phantom: PhantomData<&'b c_void>,
}
impl<'b> MyTrace<'b> {
fn new() -> Self {
Self {
buf: unsafe { malloc(128) },
_phantom: PhantomData,
}
}
}
impl<'b> Trace for MyTrace<'b> {}
impl<'b> Drop for MyTrace<'b> {
fn drop(&mut self) {
unsafe { free(self.buf) };
}
}
trait Tracer {
fn start(&mut self);
fn stop(&mut self) -> Box<Trace>;
}
struct MyTracer<'b> {
trace: Option<MyTrace<'b>>,
}
impl<'b> MyTracer<'b> {
fn new() -> Self {
Self { trace: None }
}
}
impl<'b> Tracer for MyTracer<'b> {
fn start(&mut self) {
self.trace = Some(MyTrace::new());
// Pretend the buffer is mutated in C here...
}
fn stop(&mut self) -> Box<Trace> {
Box::new(self.trace.take().unwrap())
}
}
fn main() {
let mut tracer = MyTracer::new();
tracer.start();
let _trace = tracer.stop();
println!("Hello, world!");
}
But this will give the error:
error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
--> src/main.rs:53:36
|
53 | Box::new(self.trace.take().unwrap())
| ^^^^^^
|
note: first, the lifetime cannot outlive the lifetime 'b as defined on the impl at 46:1...
--> src/main.rs:46:1
|
46 | impl<'b> Tracer for MyTracer<'b> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: ...so that the types are compatible:
expected std::option::Option<MyTrace<'_>>
found std::option::Option<MyTrace<'b>>
= note: but, the lifetime must be valid for the static lifetime...
= note: ...so that the expression is assignable:
expected std::boxed::Box<Trace + 'static>
found std::boxed::Box<Trace>
I have three sub-questions:
Did I understand the motivation for PhantomData in this scenario correctly?
Where is 'static coming from in the error message?
Can this be made to work without changing the interface of stop? Specifically, without adding a lifetime to the return type?
I'm going to ignore your direct question because I believe you arrived at it after misunderstanding several initial steps.
I could in theory move the buf pointer out of a MyTrace and use if after the struct has died
Copy the pointer, not move it, but yes.
By using a PhantomData we can ensure that only references to buf can be obtained
This is not true. It is still equally easy to get a copy of the raw pointer and misuse it even when you add a PhantomData.
Did I understand the motivation for PhantomData in this scenario correctly?
No. PhantomData is used when you want to act like you have a value of some type without actually having it. Pretending to have a reference to something is only useful when there is something to have a reference to. There's no such value to reference in your example.
The Rust docs say something about raw pointers and PhantomData, but I perhaps got it wrong
That example actually shows my point well. The Slice type is intended to behave as if it has a reference to the Vec that it's borrowed from:
fn borrow_vec<'a, T>(vec: &'a Vec<T>) -> Slice<'a, T>
Since this Slice type doesn't actually have a reference, it needs a PhantomData to act like it has a reference. Note that the lifetime 'a isn't just made up out of whole cloth — it's related to an existing value (the Vec). It would cause memory unsafety for the Slice to exist after the Vec has moved, thus it makes sense to include a lifetime of the Vec.
why the commenter in the other question suggested I use PhantomData to improve the type safety of my raw pointer
You can use PhantomData to improve the safety of raw pointers that act as references, but yours doesn't have some existing Rust value to reference. You can also use it for correctness if your pointer owns some value behind the reference, which yours seemingly does. However, since it's a c_void, it's not really useful. You'd usually see it as PhantomData<MyOwnedType>.
Where is 'static coming from in the error message?
Why is adding a lifetime to a trait with the plus operator (Iterator<Item = &Foo> + 'a) needed?

Is it safe to use a raw pointer to access the &T of a RefCell<HashMap<T>>?

I have a cache-like structure which internally uses a HashMap:
impl Cache {
fn insert(&mut self, k: u32, v: String) {
self.map.insert(k, v);
}
fn borrow(&self, k: u32) -> Option<&String> {
self.map.get(&k)
}
}
Playground with external mutability
Now I need internal mutability. Since HashMap does not implement Copy, my guess is that RefCell is the path to follow. Writing the insert method is straight forward but I encountered problems with the borrow-function. I could return a Ref<String>, but since I'd like to cache the result, I wrote a small Ref-wrapper:
struct CacheRef<'a> {
borrow: Ref<'a, HashMap<u32, String>>,
value: &'a String,
}
This won't work since value references borrow, so the struct can't be constructed. I know that the reference is always valid: The map can't be mutated, because Ref locks the map. Is it safe to use a raw pointer instead of a reference?
struct CacheRef<'a> {
borrow: Ref<'a, HashMap<u32, String>>,
value: *const String,
}
Am I overlooking something here? Are there better (or faster) options? I'm trying to avoid RefCell due to the runtime overhead.
Playground with internal mutability
I'll complement #Shepmaster's safe but not quite as efficient answer with the unsafe version. For this, we'll pack some unsafe code in a utility function.
fn map_option<'a, T, F, U>(r: Ref<'a, T>, f: F) -> Option<Ref<'a, U>>
where
F: FnOnce(&'a T) -> Option<&'a U>
{
let stolen = r.deref() as *const T;
let ur = f(unsafe { &*stolen }).map(|sr| sr as *const U);
match ur {
Some(u) => Some(Ref::map(r, |_| unsafe { &*u })),
None => None
}
}
I'm pretty sure this code is correct. Although the compiler is rather unhappy with the lifetimes, they work out. We just have to inject some raw pointers to make the compiler shut up.
With this, the implementation of borrow becomes trivial:
fn borrow<'a>(&'a self, k: u32) -> Option<Ref<'a, String>> {
map_option(self.map.borrow(), |m| m.get(&k))
}
Updated playground link
The utility function only works for Option<&T>. Other containers (such as Result) would require their own modified copy, or else GATs or HKTs to implement generically.
I'm going to ignore your direct question in favor of a definitely safe alternative:
impl Cache {
fn insert(&self, k: u32, v: String) {
self.map.borrow_mut().insert(k, v);
}
fn borrow<'a>(&'a self, k: u32) -> Option<Ref<'a, String>> {
let borrow = self.map.borrow();
if borrow.contains_key(&k) {
Some(Ref::map(borrow, |hm| {
hm.get(&k).unwrap()
}))
} else {
None
}
}
}
Ref::map allows you to take a Ref<'a, T> and convert it into a Ref<'a, U>. The ugly part of this solution is that we have to lookup in the hashmap twice because I can't figure out how to make the ideal solution work:
Ref::map(borrow, |hm| {
hm.get(&k) // Returns an `Option`, not a `&...`
})
This might require Generic Associated Types (GATs) and even then the return type might be a Ref<Option<T>>.
As mentioned by Shepmaster, it is better to avoid unsafe when possible.
There are multiple possibilities:
Ref::map, with double look-up (as illustrated by Shepmaster's answer),
Ref::map with sentinel value,
Cloning the return value.
Personally, I'd consider the latter first. Store Rc<String> into your map and your method can easily return a Option<Rc<String>> which completely sidesteps the issues:
fn get(&self, k: u32) -> Option<Rc<String>> {
self.map.borrow().get(&k).cloned()
}
As a bonus, your cache is not "locked" any longer while you use the result.
Or, alternatively, you can work-around the fact that Ref::map does not like Option by using a sentinel value:
fn borrow<'a>(&'a self, k: u32) -> Ref<'a, str> {
let borrow = self.map.borrow();
Ref::map(borrow, |map| map.get(&k).map(|s| &s[..]).unwrap_or(""))
}

How to get the index of an element in a vector using pointer arithmetic?

In C you can use the pointer offset to get index of an element within an array, e.g.:
index = element_pointer - &vector[0];
Given a reference to an element in an array, this should be possible in Rust too.
While Rust has the ability to get the memory address from vector elements, convert them to usize, then subtract them - is there a more convenient/idiomatic way to do this in Rust?
There isn't a simpler way. I think the reason is that it would be hard to guarantee that any operation or method that gave you that answer only allowed you to use it with the a Vec (or more likely slice) and something inside that collection; Rust wouldn't want to allow you to call it with a reference into a different vector.
More idiomatic would be to avoid needing to do it in the first place. You can't store references into Vec anywhere very permanent away from the Vec anyway due to lifetimes, so you're likely to have the index handy when you've got the reference anyway.
In particular, when iterating, for example, you'd use the enumerate to iterate over pairs (index, &item).
So, given the issues which people have brought up with the pointers and stuff; the best way, imho, to do this is:
fn index_of_unchecked<T>(slice: &[T], item: &T) -> usize {
if ::std::mem::size_of::<T>() == 0 {
return 0; // do what you will with this case
}
(item as *const _ as usize - slice.as_ptr() as usize)
/ std::mem::size_of::<T>()
}
// note: for zero sized types here, you
// return Some(0) if item as *const T == slice.as_ptr()
// and None otherwise
fn index_of<T>(slice: &[T], item: &T) -> Option<usize> {
let ptr = item as *const T;
if
slice.as_ptr() < ptr &&
slice.as_ptr().offset(slice.len()) > ptr
{
Some(index_of_unchecked(slice, item))
} else {
None
}
}
although, if you want methods:
trait IndexOfExt<T> {
fn index_of_unchecked(&self, item: &T) -> usize;
fn index_of(&self, item: &T) -> Option<usize>;
}
impl<T> IndexOfExt<T> for [T] {
fn index_of_unchecked(&self, item: &T) -> usize {
// ...
}
fn index_of(&self, item: &T) -> Option<usize> {
// ...
}
}
and then you'll be able to use this method for any type that Derefs to [T]

Resources