How to convert a vector of vectors into a vector of slices without creating a new object? [duplicate] - vector

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?

Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.

No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.

You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

Related

How to check if a Box is a null pointer?

I want to implement a stack using pointers or something. How can I check if a Box is a null pointer? I seen some code with Option<Box<T>> and Box<Option<T>> but I don't understand this. This is as far as I went:
struct Node {
value: i32,
next: Box<Node>,
}
struct Stack {
top: Box<Node>,
}
Box<T> can never be NULL, therefore there is nothing to check.
Box<T> values will always be fully aligned, non-null pointers
— std::box
You most likely wish to use Option to denote the absence / presence of a value:
struct Node {
value: i32,
next: Option<Box<Node>>,
}
struct Stack {
top: Option<Box<Node>>,
}
See also:
Should we use Option or ptr::null to represent a null pointer in Rust?
How to set a field in a struct with an empty value?
What is the null pointer optimization in Rust?
You don't want null. null is an unsafe antipattern even in languages where you have to use it, and thankfully Rust rids us of the atrocity. Box<T> always contains a T, never null. Rust has no concept of null.
As you've correctly pointed out, if you want a value to be optional, you use Option<T>. Whether you do Box<Option<T>> or Option<Box<T>> really doesn't matter that much, and someone who knows a bit more about the lower-level side of things can chime in on which is more efficient.
struct Node {
value: i32,
next: Option<Box<Node>>,
}
struct Stack {
top: Option<Box<Node>>,
}
The Option says "this may or may not exist" and the Box says "this value is on the heap. Now, the nice thing about Option that makes it infinitely better than null is that you have to check it. You can't forget or the compiler will complain. The typical way to do so is with match
match my_stack.top {
None => {
// Top of stack is not present
}
Some(x) => {
// Top of stack exists, and its value is x of type Box<T>
}
}
There are tons of helper methods on the Option type itself to deal with common patterns. Below are just a few of the most common ones I use. Note that all of these can be implemented in terms of match and are just convenience functions.
The equivalent of the following Java code
if (value == null) {
result = null;
} else {
result = ...;
}
is
let result = value.map(|v| ...)
Or, if the inner computation can feasibly produce None as well,
let result = value.and_then(|v| ...)
If you want to provide a default value, say zero, like
if (value == null) {
result = 0;
} else {
result = value;
}
Then you want
result = value.unwrap_or(0)
It's probably best to stop thinking in terms of how you would handle null and start learning Option<T> from scratch. Once you get the hang of it, it'll feel ten times safer and more ergonomic than null checks.
A Box<T> is a pointer to some location on the heap that contains some data of type T. Rust guarantees that Box<T> will never be a null pointer, i.e the address should always be valid as long as you aren't doing anything weird and unsafe.
If you need to represent a value that might not be there (e.g this node is the last node, so there is no next node), you can use the Option type like so
struct Node {
value: i32,
next: Option<Box<Node>>,
}
struct Stack {
top: Option<Box<Node>>,
}
Now, with Option<Box<Node>>, Node can either have a next Node or no next node. We can check if the Option is not None like so
fn print_next_node_value(node: &Node) {
match &node.next {
Some(next) => println!("the next value is {}", next.value),
None => println!("there is no next node")
}
}
Because a Box is just a pointer to some location on the heap, it can be better to use Option<Box<T>> instead of Box<Option<T>>. This is because the second one will allocate an Option<T> on the heap, while the first one will not. Additionally, Option<Box<T>> and Box<T> are equally big (both are 8 bytes). This is because Rust knows that Box<T> can never be all zeros (i.e can never be the null pointer), so it can use the all-0's state to represent the None case of Option<Box<T>>.

Is there a better functional way to process a vector with error checking?

I'm learning Rust and would like to know how I can improve the code below.
I have a vector of tuples of form (u32, String). The u32 values represent line numbers and the Strings are the text on the corresponding lines. As long as all the String values can be successfully parsed as integers, I want to return an Ok<Vec<i32>> containing the just parsed String values, but if not I want to return an error of some form (just an Err<String> in the example below).
I'm trying to learn to avoid mutability and use functional styles where appropriate, and the above is straightforward to do functionally if that was all that was needed. Here's what I came up with in this case:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
sv.iter()
.map(|s| s.1.parse::<i32>()
.map_err(|_e| "*** Invalid data.".to_string()))
.collect()
}
However, the small catch is that I want to print an error message for every invalid value (and not just the first one), and the error messages should contain both the line number and the string values in the offending tuple.
I've managed to do it with the following code:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
sv.iter()
.map(|s| (s.0, s.1.parse::<i32>()
.or_else(|e| {
eprintln!("ERROR: Invalid data value at line {}: '{}'",
s.0, s.1);
Err(e)
})))
.collect::<Vec<(u32, Result<i32, _>)>>() // Collect here to avoid short-circuit
.iter()
.map(|i| i.1
.clone()
.map_err(|_e| "*** Invalid data.".to_string()))
.collect()
}
This works, but seems rather messy and cumbersome - especially the typed collect() in the middle to avoid short-circuiting so all the errors are printed. The clone() call is also annoying, and I'm not really sure why it's needed - the compiler says I'm moving out of borrowed content otherwise, but I'm not really sure what's being moved. Is there a way it can be done more cleanly? Or should I go back to a more procedural style? When I tried, I ended up with mutable variables and a flag to indicate success and failure, which seems less elegant:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
let mut datavals = Vec::new();
let mut success = true;
for s in sv {
match s.1.parse::<i32>() {
Ok(v) => datavals.push(v),
Err(_e) => {
eprintln!("ERROR: Invalid data value at line {}: '{}'",
s.0, s.1);
success = false;
},
}
}
if success {
return Ok(datavals);
} else {
return Err("*** Invalid data.".to_string());
}
}
Can someone advise me on the best way to do this? Should I stick to the procedural style here, and if so can that be improved? Or is there a cleaner functional way to do it? Or a blend of the two? Any advice appreciated.
I think that's what partition_map() from itertools is for:
use itertools::{Either, Itertools};
fn data_vals<'a>(sv: &[&'a str]) -> Result<Vec<i32>, Vec<(&'a str, std::num::ParseIntError)>> {
let (successes, failures): (Vec<_>, Vec<_>) =
sv.iter().partition_map(|s| match s.parse::<i32>() {
Ok(v) => Either::Left(v),
Err(e) => Either::Right((*s, e)),
});
if failures.len() != 0 {
Err(failures)
} else {
Ok(successes)
}
}
fn main() {
let numbers = vec!["42", "aaaezrgggtht", "..4rez41eza", "55"];
println!("{:#?}", data_vals(&numbers));
}
In a purely functional style, you have to avoid side-effects.
Printing errors is a side-effect. The preferred style would be to return an object of the style:
Result<Vec<i32>, Vec<String>>
and print the list after the data_vals function returns.
So, essentially, you want your processing to collect a list of integers, and a list of strings:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, Vec<String>> {
let (ok, err): (Vec<_>, Vec<_>) = sv
.iter()
.map(|(i, s)| {
s.parse()
.map_err(|_e| format!("ERROR: Invalid data value at line {}: '{}'", i, s))
})
.partition(|e| e.is_ok());
if err.len() > 0 {
Err(err.iter().filter_map(|e| e.clone().err()).collect())
} else {
Ok(ok.iter().filter_map(|e| e.clone().ok()).collect())
}
}
fn main() {
let input = vec![(1, "0".to_string())];
let r = data_vals(&input);
assert_eq!(r, Ok(vec![0]));
let input = vec![(1, "zzz".to_string())];
let r = data_vals(&input);
assert_eq!(r, Err(vec!["ERROR: Invalid data value at line 1: 'zzz'".to_string()]));
}
Playground Link
This uses partition which does not depend on an external crate.
Side effects (eprintln!) in an iterator adapter are definitely not "functional". You should accumulate and return the errors and let the caller deal with them.
I would use fold here. The goal of fold is to reduce a list to a single value, starting from an initial value and augmenting the result with every item. This "single value" can very well be a list, though. Here, though, there are two possible lists we might want to return: a list of i32 if all values are valid, or a list of errors if there are any errors (I've chosen to return Strings for errors here, for simplicity.)
fn data_vals(sv: &[(u32, String)]) -> Result<Vec<i32>, Vec<String>> {
sv.iter().fold(
Ok(Vec::with_capacity(sv.len())),
|acc, (line_number, data)| {
let data = data
.parse::<i32>()
.map_err(|_| format!("Invalid data value at line {}: '{}'", line_number, data));
match (acc, data) {
(Ok(mut acc_data), Ok(this_data)) => {
// No errors yet; push the parsed value to the values vector.
acc_data.push(this_data);
Ok(acc_data)
}
(Ok(..), Err(this_error)) => {
// First error: replace the accumulator with an `Err` containing the first error.
Err(vec![this_error])
}
(Err(acc_errors), Ok(..)) => {
// There have been errors, but this item is valid; ignore it.
Err(acc_errors)
}
(Err(mut acc_errors), Err(this_error)) => {
// One more error: push it to the error vector.
acc_errors.push(this_error);
Err(acc_errors)
}
}
},
)
}
fn main() {
println!("{:?}", data_vals(&[]));
println!("{:?}", data_vals(&[(1, "123".into())]));
println!("{:?}", data_vals(&[(1, "123a".into())]));
println!("{:?}", data_vals(&[(1, "123".into()), (2, "123a".into())]));
println!("{:?}", data_vals(&[(1, "123a".into()), (2, "123".into())]));
println!("{:?}", data_vals(&[(1, "123a".into()), (2, "123b".into())]));
}
The initial value is Ok(Vec::with_capacity(sv.len())) (this is an optimization to avoid reallocating the vector as we push items to it; a simpler version would be Ok(vec![])). If the slice is empty, this will be fold's result; the closure will never be called.
For each item, the closure checks 1) whether there were any errors so far (indicated by the accumulator value being an Err) or not and 2) whether the current item is valid or not. I'm matching on two Result values simultaneously (by combining them in a tuple) to handle all 4 cases. The closure then returns an Ok if there are no errors so far (with all the parsed values so far) or an Err if there are any errors so far (with every invalid value found so far).
You'll notice I used the push method to add an item to a Vec. This is, strictly speaking, mutation, which is not considered "functional", but because we are moving the Vecs here, we know there are no other references to them, so we know we aren't affecting any other use of these Vecs.

Function with same name as struct

In general, I prefer to write initializer functions with descriptive names. However, for some structs, there is an obvious default initializer function. The standard Rust name for such a function is new, placed in the impl block for the struct. However, today I realized that I can give a function the same name as a struct, and thought this would be a good way to implement the obvious initializer function. For example:
#[derive(Debug, Clone, Copy)]
struct Pair<T, U> {
first: T,
second: U,
}
#[allow(non_snake_case)]
fn Pair<T, U>(first: T, second: U) -> Pair<T, U> {
Pair::<T, U> {
first: first,
second: second,
}
}
fn main(){
let x = Pair(1, 2);
println!("{:?}", x);
}
This is, in my opinion, much more appealing than this:
let x = Pair::new(1, 2);
However, I've never seen anyone else do this, and my question is simply if there are any problems with this approach. Are there, for example, ambiguities which it can cause which will not be there with the new implementation?
If you want to use Pair(T, U) then you should consider using a tuple struct instead:
#[derive(Debug, Clone, Copy)]
struct Pair<T, U>(T, U);
fn main(){
let x = Pair(1, 2);
println!("{:?}", x);
println!("{:?}, {:?}", (x.0, x.1));
}
Or, y’know, just a tuple ((T, U)). But I presume that Pair is not your actual use case.
There was a time when having identically named functions was the convention for default constructors; this convention fell out of favour as time went by. It is considered bad form nowadays, probably mostly for consistency. If you have a tuple struct (or variant) Pair(T, U), then you can use Pair(first, last) in a pattern, but if you have Pair { first: T, last: U } then you would need to use something more like Pair { first, last } in a pattern, and so your Pair(first, last) function would be inconsistent with the pattern. It is generally felt, thus, that these type of camel-case functions should be reserved solely for tuple structs and tuple variants, where it can be known that it is genuinely reflecting what is contained in the data structure with no further processing or magic.

Cannot move out of borrowed content when borrowing a generic type

I have a program that more or less looks like this
struct Test<T> {
vec: Vec<T>
}
impl<T> Test<T> {
fn get_first(&self) -> &T {
&self.vec[0]
}
fn do_something_with_x(&self, x: T) {
// Irrelevant
}
}
fn main() {
let t = Test { vec: vec![1i32, 2, 3] };
let x = t.get_first();
t.do_something_with_x(*x);
}
Basically, we call a method on the struct Test that borrows some value. Then we call another method on the same struct, passing the previously obtained value.
This example works perfectly fine. Now, when we make the content of main generic, it doesn't work anymore.
fn generic_main<T>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(*x);
}
Then I get the following error:
error: cannot move out of borrowed content
src/main.rs:14 let raw_x = *x;
I'm not completely sure why this is happening. Can someone explain to me why Test<i32> isn't borrowed when calling get_first while Test<T> is?
The short answer is that i32 implements the Copy trait, but T does not. If you use fn generic_main<T: Copy>(t: Test<T>), then your immediate problem is fixed.
The longer answer is that Copy is a special trait which means values can be copied by simply copying bits. Types like i32 implement Copy. Types like String do not implement Copy because, for example, it requires a heap allocation. If you copied a String just by copying bits, you'd end up with two String values pointing to the same chunk of memory. That would not be good (it's unsafe!).
Therefore, giving your T a Copy bound is quite restrictive. A less restrictive bound would be T: Clone. The Clone trait is similar to Copy (in that it copies values), but it's usually done by more than just "copying bits." For example, the String type will implement Clone by creating a new heap allocation for the underlying memory.
This requires you to change how your generic_main is written:
fn generic_main<T: Clone>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(x.clone());
}
Alternatively, if you don't want to have either the Clone or Copy bounds, then you could change your do_something_with_x method to take a reference to T rather than an owned T:
impl<T> Test<T> {
// other methods elided
fn do_something_with_x(&self, x: &T) {
// Irrelevant
}
}
And your generic_main stays mostly the same, except you don't dereference x:
fn generic_main<T>(t: Test<T>) {
let x = t.get_first();
t.do_something_with_x(x);
}
You can read more about Copy in the docs. There are some nice examples, including how to implement Copy for your own types.

Mutable vectors in struct

I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want

Resources