Does Rust protect me from iterator invalidation when pushing to a vector while iterating over it? - vector

Does Rust protect me from iterator invalidation here or am I just lucky with realloc? What guarantees are given for an iterator returned for &'a Vec<T>?
fn main() {
let mut v = vec![0; 2];
println!("capacity: {}", v.capacity());
{
let v_ref = &mut v;
for _each in v_ref.clone() {
for _ in 0..101 {
(*v_ref).push(1); // ?
}
}
}
println!("capacity: {}", v.capacity());
}

In Rust, most methods take an &self - a reference to self. In most circumstances, a call like some_string.len() internally "expands" to something like this:
let a: String = "abc".to_string();
let a_len: usize = String::len(&a); // This is identical to calling `a.len()`.
However, consider a reference to an object: a_ref, which is an &String that references a. Rust is smart enough to determine whether a reference needs to be added or removed, like we saw above (a becomes &a); In this case, a_ref.len() expands to:
let a: String = "abc".to_string();
let a_ref: &String = &a;
let a_len: usize = String::len(a_ref); // This is identical to calling `a_ref.len();`. Since `a_ref` is a reference already, it doesn't need to be altered.
Notice that this is basically equivalent to the original example, except that we're using an explicitly-set reference to a rather than a directly.
This means that v.clone() expands to Vec::clone(&v), and similarly, v_ref.clone() expands to Vec::clone(v_ref), and since v_refis &v (or, specifically, &mut v), we can simplify this back into Vec::clone(&v). In other words, these calls are equivalent - calling clone() on a basic reference (&) to an object does not clone the reference, it clones the referenced object.
In other words, Tamas Hedgeus' comment is correct: You are iterating over a new vector, which contains elements that are clones of the elements in v. The item being iterated over in your for loop is not a &Vec, it's a Vec that is separate from v, and therefore iterator invalidation is not an issue.
As for your question about the guarantees Rust provides, you'll find that Rust's borrow checker handles this rather well without any strings attached.
If you were to remove clone() from the for loop, though, you would receive an error message, use of moved value: '*v_ref', because v_ref is considered 'moved' into the for loop when you iterate over it, and cannot be used for the remainder of the function; to avoid this, the iter function creates an iterator object that only borrows the vector, allowing you to reuse the vector after the loop ends (and the iterator is dropped). And if you were to try iterating over and mutating v without the v_ref abstraction, the error reads cannot borrow 'v' as mutable because it is also borrowed as immutable. v is borrowed immutably within the iterator spawned by v.iter() (which has type signature of fn iter(&self) -> Iter<T> - note, it makes a borrow to the vector), and will not allow you to mutate the vector as a result of Rust's borrow checker, until the iterator is dropped (at the end of the for loop). However, since you can have multiple immutable references to a single object, you can still read from the vector within the for loop, just not write into it.
If you need to mutate an element of a vector while iterating over the vector, you can use iter_mut, which returns mutable references to one element at a time and lets you change that element only. You still cannot mutate the iterated vector itself with iter_mut, because Rust ensures that there is only one mutable reference to an object at a time, as well as ensuring there are no mutable references to an object in the same scope as immutable references to that object.

Related

How can I load all entries of a Vec<T> of arbitrary length onto the stack?

I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?
Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.
You can use the unsized_locals feature in nightly Rust:
#![feature(unsized_locals)]
fn example<T>(v: Vec<T>) {
let s: [T] = *v.into_boxed_slice();
dbg!(std::mem::size_of_val(&s));
}
fn main() {
let x = vec![42; 100];
example(x); // Prints 400
}
See also:
Is there a good way to convert a Vec<T> to an array?
How to get a slice as an array in Rust?
I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec
Sure you can.
Vec [...] seems to stride over pointers on the heap, dereferencing at each entry
Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.
for fast access
I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.

How to safely remove item from a vector?

Let's say I have this vector:
let mut v = vec![1,2,3];
And I want to remove some item from it:
v.remove(3);
It panics. How can I catch/gracefully handle that panic? I tried to use panic::catch_unwind but it doesn't seem to work with vectors (std::vec::Vec<i32> may not be safely transferred across an unwind boundary). Should I manually check if item exists at an index before removing it?
In general, vector and slice methods consider it a programming error if they receive an index that is out of range, and the convention in Rust is to panic for programming errors. If your code panics, you generally need to fix the code to uphold the invariant that was disregarded.
Some of the slice methods have variants that don't panic for invalid indices. One example is the indexing operator [index], which panics for and out-of-bounds index, and the get() method, which returns None if the index is out of bounds.
The remove() method does not have an equivalent that does not panic. You should check the index manually before passing it in:
if (index < v.len()) {
v.remove(index);
} else {
// Handle error
}
In real applications, this should rarely be necessary, though. The code that generates the index to be deleted can usually be written in a way that it will only yield in-bounds indices.

How can I take an item from a Vec in Rust?

I'm looking for a method that consumes a Vec and returns one element, without the overhead of restoring Vec's invariants the way remove and swap_remove do:
fn take<T>(vec: Vec<T>, index: usize) -> Option<T>
However, I can't find such a method. Am I missing something? Is this actually unsafe or impossible?
This is a different question from Built in *safe* way to move out of Vec<T>?
There the goal was a remove method that didn't panic on out of bounds access and returned a Result. I'm looking for a method that consumes a Vec and returns one of the elements. None of the answers to the above question address my question.
You can write your function like this:
fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
if vec.get(index).is_none() {
None
} else {
Some(vec.swap_remove(index))
}
}
The code you see here (get and swap_remove) is guaranteed O(1).
However, kind of hidden, vec is dropped at the end of the function and this drop operation is likely not O(1), but O(n) (where n is vec.len()). If T implements Drop, then drop() is called for every element still inside the vector, meaning dropping the vector is guaranteed O(n). If T does not implement Drop, then the Vec only needs to deallocate the memory. The time complexity of the dealloc operation depends on the allocator and is not specified, so we cannot assume it is O(1).
To mention another solution using iterators:
fn take<T>(vec: Vec<T>, index: usize) -> Option<T> {
vec.into_iter().nth(index)
}
I was about to write this:
While Iterator::nth() usually is a linear time operation, the iterator over a vector overrides this method to make it a O(1) operation.
But then I noticed, that this is only true for the iterator which iterates over slices. The std::vec::IntoIter iterator which would be used in the code above, doesn't override nth(). It has been attempted here, but it doesn't seem to be that easy.
So, as of right now, the iterator solution above is a O(n) operation! Not to mention the time needed to drop the vector, as explained above.
The reason fn take<T>(vec: Vec<T>, index: usize) -> Option<T> does not exist in the standard library is that it is not very useful in general. For example, supposing that you have a Vec<String> of length 10, it means throwing away 9 strings and only using 1. This seems wasteful.
In general, the standard library will try to provide an API that is useful in a maximum of scenarios, and in this instance it would be more logical to have a fn take<T>(vec: &mut Vec<T>, index: usize) -> Option<T>.
The only question is how to preserve the invariant, of course:
it can be preserved by exchanging with the last element, which is what Vec::swap_remove does,
it can be preserved by shifting the successor elements in, which is what Vec::drain does.
Those are very flexible, and can be adapted to fill more specific scenarios, such as yours.
Adapting swap_remove:
fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
if index < vec.len() {
Some(vec.swap_remove(index))
} else {
None
}
}
Adapting drain:
fn take<T>(mut vec: Vec<T>, index: usize) -> Option<T> {
if index < vec.len() {
vec.drain(index..index+1).next()
} else {
None
}
}
Noting that the former is more efficient: it's O(1).
I'm looking for a method that consumes the Vec and returns one element, without the overhead of restoring Vec's invariants the way remove and swap_remove do.
This reeks of premature micro-optimization to me.
First of all, note that it is necessary to destroy the elements of the vector; you can accomplish this in two ways:
swap_remove, then iterate over each element to destroy them,
Iterate over each element to destroy them, skipping the specific index.
It is not clear to me that the latter would be faster than the former; if anything it looks more complicated, with more branches (I advise two loops), which may throw off the predictor and may be less amenable to vectorization.
Secondly, before complaining about the overhead of restoring the Vec's invariant, have you properly profiled the solution?
If we look at the swap_remove variant, there are 3 steps:
swap_remove (O(1)),
destroy each remaining element (O(N)),
free the backing memory.
Step 2 may be optimized out if the element has no Drop implementation, but otherwise I would be it's a toss whether (2) or (3) is dominating the cost.
TL;DR: I am afraid that you are fighting ghost issues, profile before trying to optimize.

Convert Vec<T> to Vec<&T> [duplicate]

I can convert Vec<String> to Vec<&str> this way:
let mut items = Vec::<&str>::new();
for item in &another_items {
items.push(item);
}
Are there better alternatives?
There are quite a few ways to do it, some have disadvantages, others simply are more readable to some people.
This dereferences s (which is of type &String) to a String "right hand side reference", which is then dereferenced through the Deref trait to a str "right hand side reference" and then turned back into a &str. This is something that is very commonly seen in the compiler, and I therefor consider it idiomatic.
let v2: Vec<&str> = v.iter().map(|s| &**s).collect();
Here the deref function of the Deref trait is passed to the map function. It's pretty neat but requires useing the trait or giving the full path.
let v3: Vec<&str> = v.iter().map(std::ops::Deref::deref).collect();
This uses coercion syntax.
let v4: Vec<&str> = v.iter().map(|s| s as &str).collect();
This takes a RangeFull slice of the String (just a slice into the entire String) and takes a reference to it. It's ugly in my opinion.
let v5: Vec<&str> = v.iter().map(|s| &s[..]).collect();
This is uses coercions to convert a &String into a &str. Can also be replaced by a s: &str expression in the future.
let v6: Vec<&str> = v.iter().map(|s| { let s: &str = s; s }).collect();
The following (thanks #huon-dbaupp) uses the AsRef trait, which solely exists to map from owned types to their respective borrowed type. There's two ways to use it, and again, prettiness of either version is entirely subjective.
let v7: Vec<&str> = v.iter().map(|s| s.as_ref()).collect();
and
let v8: Vec<&str> = v.iter().map(AsRef::as_ref).collect();
My bottom line is use the v8 solution since it most explicitly expresses what you want.
The other answers simply work. I just want to point out that if you are trying to convert the Vec<String> into a Vec<&str> only to pass it to a function taking Vec<&str> as argument, consider revising the function signature as:
fn my_func<T: AsRef<str>>(list: &[T]) { ... }
instead of:
fn my_func(list: &Vec<&str>) { ... }
As pointed out by this question: Function taking both owned and non-owned string collections. In this way both vectors simply work without the need of conversions.
All of the answers idiomatically use iterators and collecting instead of a loop, but do not explain why this is better.
In your loop, you first create an empty vector and then push into it. Rust makes no guarantees about the strategy it uses for growing factors, but I believe the current strategy is that whenever the capacity is exceeded, the vector capacity is doubled. If the original vector had a length of 20, that would be one allocation, and 5 reallocations.
Iterating from a vector produces an iterator that has a "size hint". In this case, the iterator implements ExactSizeIterator so it knows exactly how many elements it will return. map retains this and collect takes advantage of this by allocating enough space in one go for an ExactSizeIterator.
You can also manually do this with:
let mut items = Vec::<&str>::with_capacity(another_items.len());
for item in &another_items {
items.push(item);
}
Heap allocations and reallocations are probably the most expensive part of this entire thing by far; far more expensive than taking references or writing or pushing to a vector when no new heap allocation is involved. It wouldn't surprise me if pushing a thousand elements onto a vector allocated for that length in one go were faster than pushing 5 elements that required 2 reallocations and one allocation in the process.
Another unsung advantage is that using the methods with collect do not store in a mutable variable which one should not use if it's unneeded.
another_items.iter().map(|item| item.deref()).collect::<Vec<&str>>()
To use deref() you must add using use std::ops::Deref
This one uses collect:
let strs: Vec<&str> = another_items.iter().map(|s| s as &str).collect();
Here is another option:
use std::iter::FromIterator;
let v = Vec::from_iter(v.iter().map(String::as_str));
Note that String::as_str is stable since Rust 1.7.

In OCaml, how do I re-assign a global variable inside a function

My program has the following global variable:
let a = (0.0,0.0);;
And the following, where eval e1 returns a string_of_float and somefunc e2 returns a tuple.
let rec output_expr = function
Binop(e1, op, e2) ->
let onDist = float_of_string(eval e1) and onDir = somefunc e2 in
let newA = onDir in (
fprintf oc "\n\t%s" ("blah");
fprintf oc "\n\t%s" ("blah");
fprintf oc "\n\t%s" ("blah");
let a = newA
)
Now, the code above gives me the following error:
Error: This expression has type bool
but an expression was expected of type unit
Command exited with code 2.
I want let a = newA to change the value of the global variable a. How can I do that?
To do it you need to make the value a reference,
let a = ref (0.0, 0.0)
then later that state can change by,
a := (1.0, 2.0);
In a functional world you would not want to have this global state. Sometimes it is very helpful, but in this particular case that is doubtful. You should pass the value a into your function and return a new value (a') that can be used subsequently; note that the value never changes, but new values take the place and are used in further computation.
In your particular case, I think you need to ask yourself why a function named output_expr modifies some global state, or returns anything but unit. But maybe this is a toy example for our consumption, so I will leave it at that.
You cannot assign to a variable (local or global is the same) in OCaml. There's simply no syntax in the language for it. In other words, variables in OCaml are what other languages call "constants" -- they get a value once in initialization, and that's it.
However, you can use a mutable data structure, which offers ways to modify its contents. Data structures are reference types, you can hold a reference to the data structure in a variable, and modify the contents, without needing to assign to the variable.
nlucaroni mentioned such a data structure, ref, which is a simple mutable cell holding a value of the desired type. There are other mutable data structures, like arrays, strings, and any record with mutable fields. Each has its own way of modifying the contents.
However, mutable state can mostly be avoided in functional programming, and if you are relying on mutable state, it may be an indication that you are not doing it the functional way.
In OCaml, values are immutable. You can't change the content of a value and should reorganize your code so that you don't need to.
Here your function output_expr should return the newA and this value should be used instead of a after that.
Actually you can have mutable variables using references but you should only use them if you know what you do and think they are better suited for a particular use case, never because you don't understand immutability.

Resources