I'm struggling to turn a simple recursive function into a simple iterator. The problem is that the recursive function maintains state in its local variables and call stack -- and to turn this into a rust iterator means basically externalizing all the function state into mutable properties on some custom iterator struct. It's quite a messy endeavor.
In a language like javascript or python, yield comes to the rescue. Are there any techniques in Rust to help manage this complexity?
Simple example using yield (pseudocode):
function one_level(state, depth, max_depth) {
if depth == max_depth {
return
}
for s in next_states_from(state) {
yield state_to_value(s);
yield one_level(s, depth+1, max_depth);
}
}
To make something similar work in Rust, I'm basically creating a Vec<Vec<State>> on my iterator struct, to reflect the data returned by next_states_from at each level of the call stack. Then for each next() invocation, carefully popping pieces off of this to restore state. I feel like I may be missing something.
You are performing a (depth-limited) depth-first search on your state graph. You can do it iteratively by using a single stack of unprocessed subtrees(depending on your state graph structure).
struct Iter {
stack: Vec<(State, u32)>,
max_depth: u32,
}
impl Iter {
fn new(root: State, max_depth: u32) -> Self {
Self {
stack: vec![(root, 0)],
max_depth
}
}
}
impl Iterator for Iter {
type Item = u32; // return type of state_to_value
fn next(&mut self) -> Option<Self::Item> {
let (state, depth) = self.stack.pop()?;
if depth < self.max_depth {
for s in next_states_from(state) {
self.stack.push((s, depth+1));
}
}
return Some(state_to_value(state));
}
}
There are some slight differences to your code:
The iterator yields the value of the root element, while your version does not. This can be easily fixed using .skip(1)
Children are processed in right-to-left order (reversed from the result of next_states_from). Otherwise, you will need to reverse the order of pushing the next states (depending on the result type of next_states_from you can just use .rev(), otherwise you will need a temporary)
Related
I want to implement a stack using pointers or something. How can I check if a Box is a null pointer? I seen some code with Option<Box<T>> and Box<Option<T>> but I don't understand this. This is as far as I went:
struct Node {
value: i32,
next: Box<Node>,
}
struct Stack {
top: Box<Node>,
}
Box<T> can never be NULL, therefore there is nothing to check.
Box<T> values will always be fully aligned, non-null pointers
— std::box
You most likely wish to use Option to denote the absence / presence of a value:
struct Node {
value: i32,
next: Option<Box<Node>>,
}
struct Stack {
top: Option<Box<Node>>,
}
See also:
Should we use Option or ptr::null to represent a null pointer in Rust?
How to set a field in a struct with an empty value?
What is the null pointer optimization in Rust?
You don't want null. null is an unsafe antipattern even in languages where you have to use it, and thankfully Rust rids us of the atrocity. Box<T> always contains a T, never null. Rust has no concept of null.
As you've correctly pointed out, if you want a value to be optional, you use Option<T>. Whether you do Box<Option<T>> or Option<Box<T>> really doesn't matter that much, and someone who knows a bit more about the lower-level side of things can chime in on which is more efficient.
struct Node {
value: i32,
next: Option<Box<Node>>,
}
struct Stack {
top: Option<Box<Node>>,
}
The Option says "this may or may not exist" and the Box says "this value is on the heap. Now, the nice thing about Option that makes it infinitely better than null is that you have to check it. You can't forget or the compiler will complain. The typical way to do so is with match
match my_stack.top {
None => {
// Top of stack is not present
}
Some(x) => {
// Top of stack exists, and its value is x of type Box<T>
}
}
There are tons of helper methods on the Option type itself to deal with common patterns. Below are just a few of the most common ones I use. Note that all of these can be implemented in terms of match and are just convenience functions.
The equivalent of the following Java code
if (value == null) {
result = null;
} else {
result = ...;
}
is
let result = value.map(|v| ...)
Or, if the inner computation can feasibly produce None as well,
let result = value.and_then(|v| ...)
If you want to provide a default value, say zero, like
if (value == null) {
result = 0;
} else {
result = value;
}
Then you want
result = value.unwrap_or(0)
It's probably best to stop thinking in terms of how you would handle null and start learning Option<T> from scratch. Once you get the hang of it, it'll feel ten times safer and more ergonomic than null checks.
A Box<T> is a pointer to some location on the heap that contains some data of type T. Rust guarantees that Box<T> will never be a null pointer, i.e the address should always be valid as long as you aren't doing anything weird and unsafe.
If you need to represent a value that might not be there (e.g this node is the last node, so there is no next node), you can use the Option type like so
struct Node {
value: i32,
next: Option<Box<Node>>,
}
struct Stack {
top: Option<Box<Node>>,
}
Now, with Option<Box<Node>>, Node can either have a next Node or no next node. We can check if the Option is not None like so
fn print_next_node_value(node: &Node) {
match &node.next {
Some(next) => println!("the next value is {}", next.value),
None => println!("there is no next node")
}
}
Because a Box is just a pointer to some location on the heap, it can be better to use Option<Box<T>> instead of Box<Option<T>>. This is because the second one will allocate an Option<T> on the heap, while the first one will not. Additionally, Option<Box<T>> and Box<T> are equally big (both are 8 bytes). This is because Rust knows that Box<T> can never be all zeros (i.e can never be the null pointer), so it can use the all-0's state to represent the None case of Option<Box<T>>.
I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.
I'm learning Rust and would like to know how I can improve the code below.
I have a vector of tuples of form (u32, String). The u32 values represent line numbers and the Strings are the text on the corresponding lines. As long as all the String values can be successfully parsed as integers, I want to return an Ok<Vec<i32>> containing the just parsed String values, but if not I want to return an error of some form (just an Err<String> in the example below).
I'm trying to learn to avoid mutability and use functional styles where appropriate, and the above is straightforward to do functionally if that was all that was needed. Here's what I came up with in this case:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
sv.iter()
.map(|s| s.1.parse::<i32>()
.map_err(|_e| "*** Invalid data.".to_string()))
.collect()
}
However, the small catch is that I want to print an error message for every invalid value (and not just the first one), and the error messages should contain both the line number and the string values in the offending tuple.
I've managed to do it with the following code:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
sv.iter()
.map(|s| (s.0, s.1.parse::<i32>()
.or_else(|e| {
eprintln!("ERROR: Invalid data value at line {}: '{}'",
s.0, s.1);
Err(e)
})))
.collect::<Vec<(u32, Result<i32, _>)>>() // Collect here to avoid short-circuit
.iter()
.map(|i| i.1
.clone()
.map_err(|_e| "*** Invalid data.".to_string()))
.collect()
}
This works, but seems rather messy and cumbersome - especially the typed collect() in the middle to avoid short-circuiting so all the errors are printed. The clone() call is also annoying, and I'm not really sure why it's needed - the compiler says I'm moving out of borrowed content otherwise, but I'm not really sure what's being moved. Is there a way it can be done more cleanly? Or should I go back to a more procedural style? When I tried, I ended up with mutable variables and a flag to indicate success and failure, which seems less elegant:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
let mut datavals = Vec::new();
let mut success = true;
for s in sv {
match s.1.parse::<i32>() {
Ok(v) => datavals.push(v),
Err(_e) => {
eprintln!("ERROR: Invalid data value at line {}: '{}'",
s.0, s.1);
success = false;
},
}
}
if success {
return Ok(datavals);
} else {
return Err("*** Invalid data.".to_string());
}
}
Can someone advise me on the best way to do this? Should I stick to the procedural style here, and if so can that be improved? Or is there a cleaner functional way to do it? Or a blend of the two? Any advice appreciated.
I think that's what partition_map() from itertools is for:
use itertools::{Either, Itertools};
fn data_vals<'a>(sv: &[&'a str]) -> Result<Vec<i32>, Vec<(&'a str, std::num::ParseIntError)>> {
let (successes, failures): (Vec<_>, Vec<_>) =
sv.iter().partition_map(|s| match s.parse::<i32>() {
Ok(v) => Either::Left(v),
Err(e) => Either::Right((*s, e)),
});
if failures.len() != 0 {
Err(failures)
} else {
Ok(successes)
}
}
fn main() {
let numbers = vec!["42", "aaaezrgggtht", "..4rez41eza", "55"];
println!("{:#?}", data_vals(&numbers));
}
In a purely functional style, you have to avoid side-effects.
Printing errors is a side-effect. The preferred style would be to return an object of the style:
Result<Vec<i32>, Vec<String>>
and print the list after the data_vals function returns.
So, essentially, you want your processing to collect a list of integers, and a list of strings:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, Vec<String>> {
let (ok, err): (Vec<_>, Vec<_>) = sv
.iter()
.map(|(i, s)| {
s.parse()
.map_err(|_e| format!("ERROR: Invalid data value at line {}: '{}'", i, s))
})
.partition(|e| e.is_ok());
if err.len() > 0 {
Err(err.iter().filter_map(|e| e.clone().err()).collect())
} else {
Ok(ok.iter().filter_map(|e| e.clone().ok()).collect())
}
}
fn main() {
let input = vec![(1, "0".to_string())];
let r = data_vals(&input);
assert_eq!(r, Ok(vec![0]));
let input = vec![(1, "zzz".to_string())];
let r = data_vals(&input);
assert_eq!(r, Err(vec!["ERROR: Invalid data value at line 1: 'zzz'".to_string()]));
}
Playground Link
This uses partition which does not depend on an external crate.
Side effects (eprintln!) in an iterator adapter are definitely not "functional". You should accumulate and return the errors and let the caller deal with them.
I would use fold here. The goal of fold is to reduce a list to a single value, starting from an initial value and augmenting the result with every item. This "single value" can very well be a list, though. Here, though, there are two possible lists we might want to return: a list of i32 if all values are valid, or a list of errors if there are any errors (I've chosen to return Strings for errors here, for simplicity.)
fn data_vals(sv: &[(u32, String)]) -> Result<Vec<i32>, Vec<String>> {
sv.iter().fold(
Ok(Vec::with_capacity(sv.len())),
|acc, (line_number, data)| {
let data = data
.parse::<i32>()
.map_err(|_| format!("Invalid data value at line {}: '{}'", line_number, data));
match (acc, data) {
(Ok(mut acc_data), Ok(this_data)) => {
// No errors yet; push the parsed value to the values vector.
acc_data.push(this_data);
Ok(acc_data)
}
(Ok(..), Err(this_error)) => {
// First error: replace the accumulator with an `Err` containing the first error.
Err(vec![this_error])
}
(Err(acc_errors), Ok(..)) => {
// There have been errors, but this item is valid; ignore it.
Err(acc_errors)
}
(Err(mut acc_errors), Err(this_error)) => {
// One more error: push it to the error vector.
acc_errors.push(this_error);
Err(acc_errors)
}
}
},
)
}
fn main() {
println!("{:?}", data_vals(&[]));
println!("{:?}", data_vals(&[(1, "123".into())]));
println!("{:?}", data_vals(&[(1, "123a".into())]));
println!("{:?}", data_vals(&[(1, "123".into()), (2, "123a".into())]));
println!("{:?}", data_vals(&[(1, "123a".into()), (2, "123".into())]));
println!("{:?}", data_vals(&[(1, "123a".into()), (2, "123b".into())]));
}
The initial value is Ok(Vec::with_capacity(sv.len())) (this is an optimization to avoid reallocating the vector as we push items to it; a simpler version would be Ok(vec![])). If the slice is empty, this will be fold's result; the closure will never be called.
For each item, the closure checks 1) whether there were any errors so far (indicated by the accumulator value being an Err) or not and 2) whether the current item is valid or not. I'm matching on two Result values simultaneously (by combining them in a tuple) to handle all 4 cases. The closure then returns an Ok if there are no errors so far (with all the parsed values so far) or an Err if there are any errors so far (with every invalid value found so far).
You'll notice I used the push method to add an item to a Vec. This is, strictly speaking, mutation, which is not considered "functional", but because we are moving the Vecs here, we know there are no other references to them, so we know we aren't affecting any other use of these Vecs.
I am trying to iterate over the sorted elements in a collection in tuples of 2 or more.
If I had a Vec, I could call
for window in my_vec.windows(2) {
// do something with window
}
but Vecs aren't implicitly sorted, which would be really nice to have. I tried to use a BTreeSet instead of a Vec, but I don't seem to be able to call windows on it.
When trying to call
for window in tree_set.iter().windows(2) {
// do something with window
}
I get the error
no method named `windows` found for type `std::collections::btree_set::Iter<'_, Card>` in the current scope
Itertools provides the tuple_windows method:
extern crate itertools;
use itertools::Itertools;
use std::collections::BTreeSet;
fn main() {
let items: BTreeSet<_> = vec![1, 3, 2].into_iter().collect();
for (a, b) in items.iter().tuple_windows() {
println!("{} < {}", a, b);
}
}
Note that windows is a method on slices, not on iterators, and it returns an iterator of subslices of the original slice. A BTreeMap presumably cannot provide that same iterator interface because it isn't built on top of a contiguous hunk of data; there's going to be some value that isn't immediately next in memory to the subsequent value.
In our current application we have a need to traverse down a tree and capture all operators on a specific device (and child devices). A device could have child devices with also specific operators on it.
As i am new to the use of recursion in Groovy i am wondering if i am doing things right..?
Any pointer to help me learn better ways of doing things?
def listOperators(device) {
// list with all operator id's
def results = []
// closure to traverse down the tree
def getAllOperators = { aDevice->
if(aDevice) {
aDevice.operators.each { it ->
results << it.id
}
}
if (aDevice?.children) {
aDevice.children.each { child ->
results << owner.call(child)
}
}
}
// call the closure with the given device
getAllOperators(device)
// return list with unique results
return results.unique()
}
A couple things to note:
Doing the recursive call through owner is not a good idea. The definition of owner changes if the call is nested within another closure. It's error prone and has no advantages over just using the name. When the closure is a local variable, split its up the declaration and definition of the closure so the name is in scope. E.g.:
def getAllOperators
getAllOperators = { ...
You are appending the operators to a result list outside the recursive closure. But you are also appending the result of each recursive call to the same list. Either append to the list or store the results from each recursive call, but not both.
Here's a simpler alternative:
def listOperators(device) {
def results = []
if (device) {
results += device.operators*.id
device.children?.each { child ->
results += listOperators(child)
}
}
results.unique()
}