How can I load all entries of a Vec<T> of arbitrary length onto the stack? - vector

I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?
Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.

You can use the unsized_locals feature in nightly Rust:
#![feature(unsized_locals)]
fn example<T>(v: Vec<T>) {
let s: [T] = *v.into_boxed_slice();
dbg!(std::mem::size_of_val(&s));
}
fn main() {
let x = vec![42; 100];
example(x); // Prints 400
}
See also:
Is there a good way to convert a Vec<T> to an array?
How to get a slice as an array in Rust?
I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec
Sure you can.
Vec [...] seems to stride over pointers on the heap, dereferencing at each entry
Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.
for fast access
I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.

Related

What is the memory layout of a vector of vectors?

I know that some part of the vector (the actual data) is stored in the heap, while some data (a struct containing the length, capacity and pointer to the actual data in heap) is stored on the stack.
What about a vector of vectors (i.e the elements of the vector are other vectors, e.g. a vector of strings)? What parts of this outer container vector are stored in the heap and on thee stack? What about the individual inner elements?
It is not true that a Vec (the struct containing pointer, length and capacity) is always stored on the stack. You can move any type (excluding self-referential ones, which can't be moved) from the stack to the heap, by putting it in a Box, Vec or other heap-using smart pointer. Just consider a straightforward type like i64: it might be stored on the stack (or in a register if the compiler so chooses), but if you write vec![7i64], you have an i64 stored on the heap and the only thing left on the stack is the Vec itself (a pointer plus length and capacity).
With this analogy, it's not hard to see that the same applies for String: it can be on the stack, but you can put it on the heap by creating a Vec<String>. Therefore, if you have a Vec<String> with length 100, there are 101 independent heap allocations: one owned by the Vec and one owned by each of the Strings.
See also
If I make a struct and put it in a vector, does it reside on the heap or the stack?
What is the memory layout of a vector of arrays?

Does Rust protect me from iterator invalidation when pushing to a vector while iterating over it?

Does Rust protect me from iterator invalidation here or am I just lucky with realloc? What guarantees are given for an iterator returned for &'a Vec<T>?
fn main() {
let mut v = vec![0; 2];
println!("capacity: {}", v.capacity());
{
let v_ref = &mut v;
for _each in v_ref.clone() {
for _ in 0..101 {
(*v_ref).push(1); // ?
}
}
}
println!("capacity: {}", v.capacity());
}
In Rust, most methods take an &self - a reference to self. In most circumstances, a call like some_string.len() internally "expands" to something like this:
let a: String = "abc".to_string();
let a_len: usize = String::len(&a); // This is identical to calling `a.len()`.
However, consider a reference to an object: a_ref, which is an &String that references a. Rust is smart enough to determine whether a reference needs to be added or removed, like we saw above (a becomes &a); In this case, a_ref.len() expands to:
let a: String = "abc".to_string();
let a_ref: &String = &a;
let a_len: usize = String::len(a_ref); // This is identical to calling `a_ref.len();`. Since `a_ref` is a reference already, it doesn't need to be altered.
Notice that this is basically equivalent to the original example, except that we're using an explicitly-set reference to a rather than a directly.
This means that v.clone() expands to Vec::clone(&v), and similarly, v_ref.clone() expands to Vec::clone(v_ref), and since v_refis &v (or, specifically, &mut v), we can simplify this back into Vec::clone(&v). In other words, these calls are equivalent - calling clone() on a basic reference (&) to an object does not clone the reference, it clones the referenced object.
In other words, Tamas Hedgeus' comment is correct: You are iterating over a new vector, which contains elements that are clones of the elements in v. The item being iterated over in your for loop is not a &Vec, it's a Vec that is separate from v, and therefore iterator invalidation is not an issue.
As for your question about the guarantees Rust provides, you'll find that Rust's borrow checker handles this rather well without any strings attached.
If you were to remove clone() from the for loop, though, you would receive an error message, use of moved value: '*v_ref', because v_ref is considered 'moved' into the for loop when you iterate over it, and cannot be used for the remainder of the function; to avoid this, the iter function creates an iterator object that only borrows the vector, allowing you to reuse the vector after the loop ends (and the iterator is dropped). And if you were to try iterating over and mutating v without the v_ref abstraction, the error reads cannot borrow 'v' as mutable because it is also borrowed as immutable. v is borrowed immutably within the iterator spawned by v.iter() (which has type signature of fn iter(&self) -> Iter<T> - note, it makes a borrow to the vector), and will not allow you to mutate the vector as a result of Rust's borrow checker, until the iterator is dropped (at the end of the for loop). However, since you can have multiple immutable references to a single object, you can still read from the vector within the for loop, just not write into it.
If you need to mutate an element of a vector while iterating over the vector, you can use iter_mut, which returns mutable references to one element at a time and lets you change that element only. You still cannot mutate the iterated vector itself with iter_mut, because Rust ensures that there is only one mutable reference to an object at a time, as well as ensuring there are no mutable references to an object in the same scope as immutable references to that object.

Convert Vec<T> to Vec<&T> [duplicate]

I can convert Vec<String> to Vec<&str> this way:
let mut items = Vec::<&str>::new();
for item in &another_items {
items.push(item);
}
Are there better alternatives?
There are quite a few ways to do it, some have disadvantages, others simply are more readable to some people.
This dereferences s (which is of type &String) to a String "right hand side reference", which is then dereferenced through the Deref trait to a str "right hand side reference" and then turned back into a &str. This is something that is very commonly seen in the compiler, and I therefor consider it idiomatic.
let v2: Vec<&str> = v.iter().map(|s| &**s).collect();
Here the deref function of the Deref trait is passed to the map function. It's pretty neat but requires useing the trait or giving the full path.
let v3: Vec<&str> = v.iter().map(std::ops::Deref::deref).collect();
This uses coercion syntax.
let v4: Vec<&str> = v.iter().map(|s| s as &str).collect();
This takes a RangeFull slice of the String (just a slice into the entire String) and takes a reference to it. It's ugly in my opinion.
let v5: Vec<&str> = v.iter().map(|s| &s[..]).collect();
This is uses coercions to convert a &String into a &str. Can also be replaced by a s: &str expression in the future.
let v6: Vec<&str> = v.iter().map(|s| { let s: &str = s; s }).collect();
The following (thanks #huon-dbaupp) uses the AsRef trait, which solely exists to map from owned types to their respective borrowed type. There's two ways to use it, and again, prettiness of either version is entirely subjective.
let v7: Vec<&str> = v.iter().map(|s| s.as_ref()).collect();
and
let v8: Vec<&str> = v.iter().map(AsRef::as_ref).collect();
My bottom line is use the v8 solution since it most explicitly expresses what you want.
The other answers simply work. I just want to point out that if you are trying to convert the Vec<String> into a Vec<&str> only to pass it to a function taking Vec<&str> as argument, consider revising the function signature as:
fn my_func<T: AsRef<str>>(list: &[T]) { ... }
instead of:
fn my_func(list: &Vec<&str>) { ... }
As pointed out by this question: Function taking both owned and non-owned string collections. In this way both vectors simply work without the need of conversions.
All of the answers idiomatically use iterators and collecting instead of a loop, but do not explain why this is better.
In your loop, you first create an empty vector and then push into it. Rust makes no guarantees about the strategy it uses for growing factors, but I believe the current strategy is that whenever the capacity is exceeded, the vector capacity is doubled. If the original vector had a length of 20, that would be one allocation, and 5 reallocations.
Iterating from a vector produces an iterator that has a "size hint". In this case, the iterator implements ExactSizeIterator so it knows exactly how many elements it will return. map retains this and collect takes advantage of this by allocating enough space in one go for an ExactSizeIterator.
You can also manually do this with:
let mut items = Vec::<&str>::with_capacity(another_items.len());
for item in &another_items {
items.push(item);
}
Heap allocations and reallocations are probably the most expensive part of this entire thing by far; far more expensive than taking references or writing or pushing to a vector when no new heap allocation is involved. It wouldn't surprise me if pushing a thousand elements onto a vector allocated for that length in one go were faster than pushing 5 elements that required 2 reallocations and one allocation in the process.
Another unsung advantage is that using the methods with collect do not store in a mutable variable which one should not use if it's unneeded.
another_items.iter().map(|item| item.deref()).collect::<Vec<&str>>()
To use deref() you must add using use std::ops::Deref
This one uses collect:
let strs: Vec<&str> = another_items.iter().map(|s| s as &str).collect();
Here is another option:
use std::iter::FromIterator;
let v = Vec::from_iter(v.iter().map(String::as_str));
Note that String::as_str is stable since Rust 1.7.

What is the correct way to convert a Vec for FFI without reallocation?

I need to pass a Vec of elements across the FFI. Experimenting, I came across a few interesting points. I started with giving the FFI all 3: ptr, len and capacity so that I could reconstruct the Vec to destroy it later:
let ptr = vec.as_mut_ptr();
let len = vec.len();
let cap = vec.capacity();
mem::forget(vec);
extern_fn(ptr, len, cap);
// ...
pub unsafe extern "C" fn free(ptr: *mut u8, len: usize, cap: usize) {
let _ = Vec::from_raw_parts(ptr, len, cap);
}
I wanted to get rid of capacity as it's useless to my frontend; it's just so that I can reconstruct my vector to free the memory.
Vec::shrink_to_fit() is tempting as it seems to eliminate the need of dealing with capacity. Unfortunately, the documentation on it does not guarantee that it'll make len == capacity, hence I assume that during from_raw_parts() will likely trigger Undefined Behavior.
into_boxed_slice() seems to have a guarantee that it's going to make len == capacity from the docs, so I used that next. Please correct me if I'm wrong. The problem is that it does not seem to guarantee no-reallocation. Here is a simple program:
fn main() {
let mut v = Vec::with_capacity(1000);
v.push(100u8);
v.push(110);
let ptr_1 = v.as_mut_ptr();
let mut boxed_slice = v.into_boxed_slice();
let ptr_2 = boxed_slice.as_mut_ptr();
let ptr_3 = Box::into_raw(boxed_slice);
println!("{:?}. {:?}. {:?}", ptr_1, ptr_2, ptr_3);
}
In the playground, It prints:
rustc 1.14.0 (e8a012324 2016-12-16)
0x7fdc9841b000. 0x7fdc98414018. 0x7fdc98414018
This is not good if it has to find new memory instead of being able to shed off extra capacity without causing a copy.
Is there any other way I can pass my vector across the FFI (to C) and not pass capacity? It seems into_boxed_slice() is what I need, but why does it involve re-allocation and copying data?
The reason is relatively simple.
Modern memory allocators will segregate allocations in "sized" slabs, where each slab is responsible for dealing with a given range of sizes. For example:
8 bytes slab: anything from 1 to 8 bytes
16 bytes slab: anything from 9 to 16 bytes
24 bytes slab: anything from 17 to 24 bytes
...
When you allocate memory, you ask for a given size, the allocator finds the right slab, gets a chunk from it, and returns your pointer.
When you deallocate memory... how do you expect the allocator to find the right slab? There are 2 solutions:
the allocator has a way to search for the slab that contains your range of memory, somehow, which involves either a linear search through the slabs or some kind of global look-up table or ...
you tell the allocator what was the size of the allocated block
It's obvious here that the C interface (free, realloc) is rather sub-par, and therefore Rust wishes to use the more efficient interface instead, the one where the onus is on the caller.
So, you have two choices:
Pass the capacity
Ensure that the length and the capacity are equal
As you realized, (2) may require a new allocation, which is quite undesirable. (1) can be implemented either by passing the capacity the whole way, or stash it at some point then retrieve it when you need it.
That's it. You have to evaluate your trade-offs.

Slices in golang do not allocate any memory?

This link: http://research.swtch.com/godata
It says (third paragraph of section Slices):
Because slices are multiword structures, not pointers, the slicing
operation does not need to allocate memory, not even for the slice
header, which can usually be kept on the stack. This representation
makes slices about as cheap to use as passing around explicit pointer
and length pairs in C. Go originally represented a slice as a pointer
to the structure shown above, but doing so meant that every slice
operation allocated a new memory object. Even with a fast allocator,
that creates a lot of unnecessary work for the garbage collector, and
we found that, as was the case with strings above, programs avoided
slicing operations in favor of passing explicit indices. Removing the
indirection and the allocation made slices cheap enough to avoid
passing explicit indices in most cases.
What...? Why does it not allocate any memory? If it is a multiword structure or a pointer? Does it not need to allocate memory? Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now? Very confused
To expand on Pravin Mishra's answer:
the slicing operation does not need to allocate memory.
"Slicing operation" refers to things like s1[x:y] and not slice initialization or make([]int, x). For example:
var s1 = []int{0, 1, 2, 3, 4, 5} // <<- allocates (or put on stack)
s2 := s1[1:3] // <<- does not (normally) allocate
That is, the second line is similar to:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
…
example := SliceHeader{&s1[1], 2, 5}
Usually local variables like example get put onto the stack. It's just like if this was done instead of using a struct:
var exampleData uintptr
var exampleLen, exampleCap int
Those example* variables go onto the stack.
Only if the code does return &example or otherFunc(&example) or otherwise allows a pointer to this to escape will the compiler be forced to allocate the struct (or slice header) on the heap.
Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now?
Imagine that instead of the above you did:
example2 := &SliceHeader{…same…}
// or
example3 := new(SliceHeader)
example3.Data = …
example3.Len = …
example3.Cap = …
i.e. the type is *SliceHeader rather than SliceHeader.
This is effectively what slices used to be (pre Go 1.0) according to what you mention.
It also used to be that both example2 and example3 would have to be allocated on the heap. That is the "memory for a new object" being refered to. I think that now escape analysis will try and put both of these onto the stack as long as the pointer(s) are kept local to the function so it's not as big of an issue anymore. Either way though, avoiding one level of indirection is good, it's almost always faster to copy three ints compared to copying a pointer and dereferencing it repeatedly.
Every data type allocates memory when it's initialized. In blog, he clearly mention
the slicing operation does not need to allocate memory.
And he is right. Now see, how slice works in golang.
Slices hold references to an underlying array, and if you assign one
slice to another, both refer to the same array. If a function takes a
slice argument, changes it makes to the elements of the slice will be
visible to the caller, analogous to passing a pointer to the
underlying array.

Resources