How can I copy a vector to another location and reuse the existing allocated memory? - vector

In C++, to copy the contents of a vector to another vector we use the assignment operator dest = src. However, in Rust src would be moved into dest and no longer usable.
I know the simplest answer is to do dest = src.clone() (for the sake of this question we'll assume T in Vec<T> is Clone). However - if I'm understanding correctly - this creates a brand new third vector with the copied contents of src and moves it into dest, throwing away dest's dynamically allocated array. If this is correct, it's a completely unnecessary dynamic allocation when we could have just copied the content directly into dest (assuming it had sufficient capacity).
Below is a function I've made that does exactly what I would like to do: empty out the dest vector and copy the elements of src to it.
// copy contents of src to dest without just cloning src
fn copy_content<T: Clone>(dest: &mut Vec<T>, src: &Vec<T>) {
dest.clear();
if dest.capacity() < src.len() {
dest.reserve(src.len());
}
for x in src {
dest.push(x.clone());
}
}
Is there a way to do this with builtin or standard library utilities? Is the dest = src.clone() optimized by the compiler to do this anyway?
I know that if T has dynamic resources then the extra allocation from src.clone() isn't a big deal, but if T is e.g. i32 or any other Copy type then it forces an allocation where none are necessary.

Did you ever look at the definition of Clone? It has the well known clone method but also a useful but often forgotten clone_from method:
pub trait Clone : Sized {
fn clone(&self) -> Self;
fn clone_from(&mut self, source: &Self) {
*self = source.clone()
}
}
To quote the doc:
Performs copy-assignment from source.
a.clone_from(&b) is equivalent to a = b.clone() in functionality, but can be overridden to reuse the resources of a to avoid unnecessary allocations.
Of course a type such as Vec does not use the provided-by-default clone_from and defines its own in a more efficient way, similar to what you would get in C++ from writing dest = src:
fn clone_from(&mut self, other: &Vec<T>) {
other.as_slice().clone_into(self);
}
with [T]::clone_into being defined as:
fn clone_into(&self, target: &mut Vec<T>) {
// drop anything in target that will not be overwritten
target.truncate(self.len());
let len = target.len();
// reuse the contained values' allocations/resources.
target.clone_from_slice(&self[..len]);
// target.len <= self.len due to the truncate above, so the
// slice here is always in-bounds.
target.extend_from_slice(&self[len..]);
}

Related

How to keep using a value after pushing it into a vector?

So I want to keep using a value after pushing it into a vector, but when I add it to the vector it takes ownership control over the variable, so when I want to make reference to it again I can't. How should I approach this scenario?
fn scan_recursive(dir: &Path) -> io::Result<Vec<PathBuf>> {
let mut files = Vec::new(); // Create a mutable vector to store the files.
for entry in fs::read_dir(dir)? {
let entry = entry?;
let path = entry.path();
if path.is_file() {
files.push(path);
}
if path.is_dir() { // ERROR: path is no longer valid
files.append(&mut scan_recursive(&path)?);
}
}
Ok(files)
}
The problem is that the compiler does not know if the first if is executed and cannot tell if the path variable is consumed or not, so it assumes that it always is, and the lifetime of path ends after the push to the vector.
Given that your two ifs are mutually exclusive (the path is either a file or a directory) you can use if ... else, which will let the compiler deduce that if the path is not a file, the variables has not been consumed in the if branch and it will still be available in the else branch.
if path.is_file() {
files.push(path);
}
else if path.is_dir() {
files.append(&mut scan_recursive(&path)?);
}

How to avoid cloning parts when changing a mutable struct while recursion over that struct

I try to go recursively through a rose tree. The following code also works as intended but I still have the problem that I need to clone the value due to issues with the borrow checker. Therefore, it would be nice if there is a way to change from cloning to something better.
Without the clone() rust complains (rightfully) that I borrow self mutable by looking at the child nodes and the second time in the closure.
The whole structure and code is more complicated and bigger than shown below but that are the core elements. Do I have to change the data structure or do I miss something obvious? If the data structure is the issue how would you change it?
Also, the NType enum seems kinda useless here but I have some additional kinds that I have to consider. Here Inner nodes always have children and Outer nodes never.
enum NType{
Inner,
Outer
}
#[derive(Eq, PartialEq, Clone, Debug)]
struct Node {
// isn't a i32 actually. In my real program it's another struct
count: i32,
n_type: NType,
children: Option<Vec<usize>>
}
#[derive(Eq, PartialEq, Clone, Debug)]
struct Tree {
nodes: Vec<Node>,
}
impl Tree{
pub fn calc(&mut self, features: &Vec<i32>) -> i32{
// root is the last node
self.calc_h(self.nodes.len() - 1, features);
self.nodes[self.nodes.len() - 1].count.clone()
}
fn calc_h(&mut self, current: usize, features: &Vec<i32>){
// do some other things to decide where to go into recursion and where not to
// also use the features
if self.nodes[current].n_type == Inner{
//cloneing is very expensiv and destroys the performance
self.nodes[current].children.as_ref().unwrap().clone().iter().for_each(|&n| self.calc_h(n, features));
self.do_smt(current)
}
self.do_smt(current)
}
}
Edit:
Lagerbaer suggested to use as_mut but that results into current being a &mut usize and that doesn't really solve the problem.
changed childs into children
The correct plural of child is children so this is what I will refer to in this answer. Presumably this is what childs means in your code.
Since node.children is already an Option, the best solution would be to .take() the vector out of the node at the start of the iteration and put it in at the end. This way we avoid holding a reference to tree.nodes during the iteration.
if self.nodes[current].n_type == Inner {
let children = self.nodes[current].children.take().unwrap();
for &child in children.iter() {
self.calc_h(child, features);
}
self.nodes[current].children = Some(children);
}
Note that the behavior is different from your original code in case of cycles, but this is not something you need to worry about if the rest of the tree is implemented correctly.

How to convert a vector of vectors into a vector of slices without creating a new object? [duplicate]

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

Cannot move out of borrowed content when borrowing a generic type

I have a program that more or less looks like this
struct Test<T> {
vec: Vec<T>
}
impl<T> Test<T> {
fn get_first(&self) -> &T {
&self.vec[0]
}
fn do_something_with_x(&self, x: T) {
// Irrelevant
}
}
fn main() {
let t = Test { vec: vec![1i32, 2, 3] };
let x = t.get_first();
t.do_something_with_x(*x);
}
Basically, we call a method on the struct Test that borrows some value. Then we call another method on the same struct, passing the previously obtained value.
This example works perfectly fine. Now, when we make the content of main generic, it doesn't work anymore.
fn generic_main<T>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(*x);
}
Then I get the following error:
error: cannot move out of borrowed content
src/main.rs:14 let raw_x = *x;
I'm not completely sure why this is happening. Can someone explain to me why Test<i32> isn't borrowed when calling get_first while Test<T> is?
The short answer is that i32 implements the Copy trait, but T does not. If you use fn generic_main<T: Copy>(t: Test<T>), then your immediate problem is fixed.
The longer answer is that Copy is a special trait which means values can be copied by simply copying bits. Types like i32 implement Copy. Types like String do not implement Copy because, for example, it requires a heap allocation. If you copied a String just by copying bits, you'd end up with two String values pointing to the same chunk of memory. That would not be good (it's unsafe!).
Therefore, giving your T a Copy bound is quite restrictive. A less restrictive bound would be T: Clone. The Clone trait is similar to Copy (in that it copies values), but it's usually done by more than just "copying bits." For example, the String type will implement Clone by creating a new heap allocation for the underlying memory.
This requires you to change how your generic_main is written:
fn generic_main<T: Clone>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(x.clone());
}
Alternatively, if you don't want to have either the Clone or Copy bounds, then you could change your do_something_with_x method to take a reference to T rather than an owned T:
impl<T> Test<T> {
// other methods elided
fn do_something_with_x(&self, x: &T) {
// Irrelevant
}
}
And your generic_main stays mostly the same, except you don't dereference x:
fn generic_main<T>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(x);
}
You can read more about Copy in the docs. There are some nice examples, including how to implement Copy for your own types.

Mutable vectors in struct

I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want

Resources