Can I get rid of "mut" here? - functional-programming

I have a vector of strings and I want to extract some data from them and create a structure out of it. It looks something like this:
let mut my_struct = MyStruct::new(0, 0, 0);
let regex1 = Regex::new("...");
let regex2 = Regex::new("...");
for s_iter in my_str_vec.iter() {
if regex1.is_match(s_iter) {
// parsing the value
// .........
let var1 = regex1.captures("....");
// and assign it to to a field of the struct instance
my_struct.field1 = var1;
}
// do the same thing for other fields each step in the loop
// each step only one regex gets triggered
if regex2.is_match(s_iter) {
// parsing the value
// .........
let var2 = regex12.captures("....");
// and assign it to to a field of the struct instance
my_struct.field2 = var2;
}
}
// now "my_struct" is now completely initialized
As you can see, I have to use mut for the structure. Is there any way to do that without mut? I want to be able to initialize the struct all at once, without mut. Or I can consider other options without mut also.

In a purely functional language, you would need to define a recursive function. In Rust, it would look like this:
fn parse(my_str_vec: Vec<&str>) -> MyStruct {
let my_struct = MyStruct::new(0, 0, 0);
let regex1 = Regex::new(...);
let regex2 = Regex::new(...);
fn process_next<I>(regex1: Regex, regex2: Regex, mut str_iter: I, my_struct: MyStruct) -> MyStruct
where I: Iterator, I::Item: AsRef<str>
{
match str_iter.next() {
Some(s_iter) => {
let my_struct = if regex1.is_match(s_iter.as_ref()) {
let var1 = regex1.captures("var1");
MyStruct { field1: var1, ..my_struct }
} else if regex2.is_match(s_iter.as_ref()) {
let var2 = regex2.captures("var2");
MyStruct { field2: var2, ..my_struct }
} else {
my_struct
};
process_next(regex1, regex2, str_iter, my_struct)
}
None => my_struct
}
}
process_next(regex1, regex2, my_str_vec.iter(), my_struct)
}
Note that there's still a mut in this code: we have to define str_iter as mutable, because calling next() requires the receiver to be mutable.
While the inner function is tail recursive, Rust does not guarantee tail calls, so it could crash with a stack overflow if the input is too large. Personally, I'd rather use mut here, as in your original code, especially since mut in Rust implies no aliasing, which eliminates a whole class of potential bugs.

Related

How does Rust model iterators? Stack or Heap?

I know that vectors in Rust are allocated on the heap where the pointer, capacity, and length of the vector are stored on the stack.
Let's say I have the following vector:
let vec = vec![1, 2, 3];
If I make an iterator from this vector:
let vec_iter = vec.iter();
How does Rust model this iterator in terms of allocation on the heap vs. stack? Is it the same as the vector?
Most iterators are stack allocated.
In cases like Vec::iter(), they create iterators that have two pointers, one to the end, one to the first element, like so
use std::marker::PhantomData;
pub struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: PhantomData<&'a T>,
}
Since pointer doesn't convey ownership or lifetime, PhantomData<&'a T> tells the compiler this struct holds reference of lifetime 'a to type T
Iter::next looks somewhat like this
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
unsafe {// pointer dereferencing is only allowed in unsafe
if self.ptr == self.end {
None
} else {
let old = self.ptr;
self.ptr = self.ptr.offset(1);
Some(&*old)
}
}
}
}
And a new Iter is created like so
impl<'a, T: 'a> Iter<'a, T> {
pub fn new(slice: &'a [T]) -> Self {
assert_ne!(std::mem::size_of::<T>(), 0); // doesn't handle zero size type
let start = slice.as_ptr();
Iter {
ptr: start,
end: unsafe { start.add(slice.len()) },
_marker: PhantomData,
}
}
}
Now we can use it like any other iterators
let v = vec!['a', 'b', 'c', 'd', 'e'];
for c in Iter::new(&v) {
println!("{c}");
}
And thanks to PhantomData, the compiler can guard us against use after free and other memory issues.
let iter = {
let v = vec!['a', 'b', 'c', 'd', 'e'];
Iter::new(&v) // error! borrowed value doesn't live long enough
};
for c in iter {
println!("{c}");
}
the pointer, capacity, and length of the vector are stored on the stack
→ not really. They are stored wherever the user wishes, which may be on the stack, in the global data segment, or on the heap:
// In the global data segment
static VEC: Vec<()> = Vec::new();
struct Foo {
v: Vec<()>,
}
fn main() {
// On the stack
let v: Vec<()> = Vec::new();
// On the heap
let f = Box::new (Foo { v: Vec::new(), });
}
And the same goes for iterators. Most of the time they simply hold references to the original data, wherever that may be, and the references themselves are stored inside the iterator struct wherever the user puts it.

Rust: Joining and iterating over futures' results

I have some code that iterates over objects and uses an async method on each of them sequentially before doing something with the results. I'd like to change it so that the async method calls are joined into a single future before being executed. The important bit below is in HolderStruct::add_squares. My current code looks like this:
use anyhow::Result;
struct AsyncMethodStruct {
value: u64
}
impl AsyncMethodStruct {
fn new(value: u64) -> Self {
AsyncMethodStruct {
value
}
}
async fn get_square(&self) -> Result<u64> {
Ok(self.value * self.value)
}
}
struct HolderStruct {
async_structs: Vec<AsyncMethodStruct>
}
impl HolderStruct {
fn new(async_structs: Vec<AsyncMethodStruct>) -> Self {
HolderStruct {
async_structs
}
}
async fn add_squares(&self) -> Result<u64> {
let mut squares = Vec::with_capacity(self.async_structs.len());
for async_struct in self.async_structs.iter() {
squares.push(async_struct.get_square().await?);
}
let mut sum = 0;
for square in squares.iter() {
sum += square;
}
return Ok(sum);
}
}
I'd like to change HolderStruct::add_squares to something like this:
use futures::future::join_all;
// [...]
impl HolderStruct {
async fn add_squares(&self) -> Result<u64> {
let mut square_futures = Vec::with_capacity(self.async_structs.len());
for async_struct in self.async_structs.iter() {
square_futures.push(async_struct.get_square());
}
let square_results = join_all(square_futures).await;
let mut sum = 0;
for square_result in square_results.iter() {
sum += square_result?;
}
return Ok(sum);
}
}
However, the compiler gives me this error using the above:
error[E0277]: the `?` operator can only be applied to values that implement `std::ops::Try`
--> src/main.rs:46:20
|
46 | sum += square_result?;
| ^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `&std::result::Result<u64, anyhow::Error>`
|
= help: the trait `std::ops::Try` is not implemented for `&std::result::Result<u64, anyhow::Error>`
= note: required by `std::ops::Try::into_result`
How would I change the code to not have this error?
for square_result in square_results.iter()
Lose the iter() call here.
for square_result in square_results
You seem to be under impression that calling iter() is mandatory to iterate over a collection. Actually, anything that implements IntoIterator can be used in a for loop.
Calling iter() on a Vec<T> derefs to slice (&[T]) and yields an iterator over references to the vectors elements. The ? operator tries to take the value out of the Result, but that is only possible if you own the Result rather than just have a reference to it.
However, if you simply use a vector itself in a for statement, it will use the IntoIterator implementation for Vec<T> which will yield items of type T rather than &T.
square_results.into_iter() does the same thing, albeit more verbosely. It is mostly useful when using iterators in a functional style, a la vector.into_iter().map(|x| x + 1).collect().

Is it possible to have a closure call itself in Rust? [duplicate]

This is a very simple example, but how would I do something similar to:
let fact = |x: u32| {
match x {
0 => 1,
_ => x * fact(x - 1),
}
};
I know that this specific example can be easily done with iteration, but I'm wondering if it's possible to make a recursive function in Rust for more complicated things (such as traversing trees) or if I'm required to use my own stack instead.
There are a few ways to do this.
You can put closures into a struct and pass this struct to the closure. You can even define structs inline in a function:
fn main() {
struct Fact<'s> { f: &'s dyn Fn(&Fact, u32) -> u32 }
let fact = Fact {
f: &|fact, x| if x == 0 {1} else {x * (fact.f)(fact, x - 1)}
};
println!("{}", (fact.f)(&fact, 5));
}
This gets around the problem of having an infinite type (a function that takes itself as an argument) and the problem that fact isn't yet defined inside the closure itself when one writes let fact = |x| {...} and so one can't refer to it there.
Another option is to just write a recursive function as a fn item, which can also be defined inline in a function:
fn main() {
fn fact(x: u32) -> u32 { if x == 0 {1} else {x * fact(x - 1)} }
println!("{}", fact(5));
}
This works fine if you don't need to capture anything from the environment.
One more option is to use the fn item solution but explicitly pass the args/environment you want.
fn main() {
struct FactEnv { base_case: u32 }
fn fact(env: &FactEnv, x: u32) -> u32 {
if x == 0 {env.base_case} else {x * fact(env, x - 1)}
}
let env = FactEnv { base_case: 1 };
println!("{}", fact(&env, 5));
}
All of these work with Rust 1.17 and have probably worked since version 0.6. The fn's defined inside fns are no different to those defined at the top level, except they are only accessible within the fn they are defined inside.
As of Rust 1.62 (July 2022), there's still no direct way to recurse in a closure. As the other answers have pointed out, you need at least a bit of indirection, like passing the closure to itself as an argument, or moving it into a cell after creating it. These things can work, but in my opinion they're kind of gross, and they're definitely hard for Rust beginners to follow. If you want to use recursion but you have to have a closure, for example because you need something that implements FnOnce() to use with thread::spawn, then I think the cleanest approach is to use a regular fn function for the recursive part and to wrap it in a non-recursive closure that captures the environment. Here's an example:
let x = 5;
let fact = || {
fn helper(arg: u64) -> u64 {
match arg {
0 => 1,
_ => arg * helper(arg - 1),
}
}
helper(x)
};
assert_eq!(120, fact());
Here's a really ugly and verbose solution I came up with:
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
fn main() {
let weak_holder: Rc<RefCell<Weak<dyn Fn(u32) -> u32>>> =
Rc::new(RefCell::new(Weak::<fn(u32) -> u32>::new()));
let weak_holder2 = weak_holder.clone();
let fact: Rc<dyn Fn(u32) -> u32> = Rc::new(move |x| {
let fact = weak_holder2.borrow().upgrade().unwrap();
if x == 0 {
1
} else {
x * fact(x - 1)
}
});
weak_holder.replace(Rc::downgrade(&fact));
println!("{}", fact(5)); // prints "120"
println!("{}", fact(6)); // prints "720"
}
The advantages of this are that you call the function with the expected signature (no extra arguments needed), it's a closure that can capture variables (by move), it doesn't require defining any new structs, and the closure can be returned from the function or otherwise stored in a place that outlives the scope where it was created (as an Rc<Fn...>) and it still works.
Closure is just a struct with additional contexts. Therefore, you can do this to achieve recursion (suppose you want to do factorial with recursive mutable sum):
#[derive(Default)]
struct Fact {
ans: i32,
}
impl Fact {
fn call(&mut self, n: i32) -> i32 {
if n == 0 {
self.ans = 1;
return 1;
}
self.call(n - 1);
self.ans *= n;
self.ans
}
}
To use this struct, just:
let mut fact = Fact::default();
let ans = fact.call(5);

Is it possible to make a recursive closure in Rust?

This is a very simple example, but how would I do something similar to:
let fact = |x: u32| {
match x {
0 => 1,
_ => x * fact(x - 1),
}
};
I know that this specific example can be easily done with iteration, but I'm wondering if it's possible to make a recursive function in Rust for more complicated things (such as traversing trees) or if I'm required to use my own stack instead.
There are a few ways to do this.
You can put closures into a struct and pass this struct to the closure. You can even define structs inline in a function:
fn main() {
struct Fact<'s> { f: &'s dyn Fn(&Fact, u32) -> u32 }
let fact = Fact {
f: &|fact, x| if x == 0 {1} else {x * (fact.f)(fact, x - 1)}
};
println!("{}", (fact.f)(&fact, 5));
}
This gets around the problem of having an infinite type (a function that takes itself as an argument) and the problem that fact isn't yet defined inside the closure itself when one writes let fact = |x| {...} and so one can't refer to it there.
Another option is to just write a recursive function as a fn item, which can also be defined inline in a function:
fn main() {
fn fact(x: u32) -> u32 { if x == 0 {1} else {x * fact(x - 1)} }
println!("{}", fact(5));
}
This works fine if you don't need to capture anything from the environment.
One more option is to use the fn item solution but explicitly pass the args/environment you want.
fn main() {
struct FactEnv { base_case: u32 }
fn fact(env: &FactEnv, x: u32) -> u32 {
if x == 0 {env.base_case} else {x * fact(env, x - 1)}
}
let env = FactEnv { base_case: 1 };
println!("{}", fact(&env, 5));
}
All of these work with Rust 1.17 and have probably worked since version 0.6. The fn's defined inside fns are no different to those defined at the top level, except they are only accessible within the fn they are defined inside.
As of Rust 1.62 (July 2022), there's still no direct way to recurse in a closure. As the other answers have pointed out, you need at least a bit of indirection, like passing the closure to itself as an argument, or moving it into a cell after creating it. These things can work, but in my opinion they're kind of gross, and they're definitely hard for Rust beginners to follow. If you want to use recursion but you have to have a closure, for example because you need something that implements FnOnce() to use with thread::spawn, then I think the cleanest approach is to use a regular fn function for the recursive part and to wrap it in a non-recursive closure that captures the environment. Here's an example:
let x = 5;
let fact = || {
fn helper(arg: u64) -> u64 {
match arg {
0 => 1,
_ => arg * helper(arg - 1),
}
}
helper(x)
};
assert_eq!(120, fact());
Here's a really ugly and verbose solution I came up with:
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
fn main() {
let weak_holder: Rc<RefCell<Weak<dyn Fn(u32) -> u32>>> =
Rc::new(RefCell::new(Weak::<fn(u32) -> u32>::new()));
let weak_holder2 = weak_holder.clone();
let fact: Rc<dyn Fn(u32) -> u32> = Rc::new(move |x| {
let fact = weak_holder2.borrow().upgrade().unwrap();
if x == 0 {
1
} else {
x * fact(x - 1)
}
});
weak_holder.replace(Rc::downgrade(&fact));
println!("{}", fact(5)); // prints "120"
println!("{}", fact(6)); // prints "720"
}
The advantages of this are that you call the function with the expected signature (no extra arguments needed), it's a closure that can capture variables (by move), it doesn't require defining any new structs, and the closure can be returned from the function or otherwise stored in a place that outlives the scope where it was created (as an Rc<Fn...>) and it still works.
Closure is just a struct with additional contexts. Therefore, you can do this to achieve recursion (suppose you want to do factorial with recursive mutable sum):
#[derive(Default)]
struct Fact {
ans: i32,
}
impl Fact {
fn call(&mut self, n: i32) -> i32 {
if n == 0 {
self.ans = 1;
return 1;
}
self.call(n - 1);
self.ans *= n;
self.ans
}
}
To use this struct, just:
let mut fact = Fact::default();
let ans = fact.call(5);

Owned pointer in graph structure

With the generous help of the rust community I managed to get the base of a topological data structure assembled using managed pointers. This came together rather nicely and I was pretty excited about Rust in general. Then I read this post (which seems like a reasonable plan) and it inspired me to back track and try to re-assemble it using only owned pointers if possible.
This is the working version using managed pointers:
struct Dart<T> {
alpha: ~[#mut Dart<T>],
embed: ~[#mut T],
tagged: bool
}
impl<T> Dart<T> {
pub fn new(dim: uint) -> #mut Dart<T> {
let mut dart = #mut Dart{alpha: ~[], embed: ~[], tagged: false};
dart.alpha = vec::from_elem(dim, dart);
return dart;
}
pub fn get_dim(&self) -> uint {
return self.alpha.len();
}
pub fn traverse(#mut self, invs: &[uint], f: &fn(&Dart<T>)) {
let dim = self.get_dim();
for invs.each |i| {if *i >= dim {return}}; //test bounds on invs vec
if invs.len() == 2 {
let spread:int = int::abs(invs[1] as int - invs[0] as int);
if spread == 1 { //simple loop
let mut dart = self;
let mut i = invs[0];
while !dart.tagged {
dart.tagged = true;
f(dart);
dart = dart.alpha[i];
if i == invs[0] {i = invs[1];}
else {i == invs[0];}
} }
// else if spread == 2 { // max 4 cells traversed
// }
}
else {
let mut stack = ~[self];
self.tagged = true;
while !stack.is_empty() {
let mut dart = stack.pop();
f(dart);
for invs.each |i| {
if !dart.alpha[*i].tagged {
dart.alpha[*i].tagged = true;
stack.push(dart);
} } } } } }
After a few hours of chasing lifetime errors I have come to the conclusion that this may not even be possible with owned pointers due to the cyclic nature (without tying the knot as I was warned). My feeble attempt at this is below. My question, is this structure possible to implement without resorting to managed pointers? And if not, is the code above considered reasonably "rusty"? (idiomatic rust). Thanks.
struct GMap<'self,T> {
dim: uint,
darts: ~[~Dart<'self,T>]
}
struct Dart<'self,T> {
alpha: ~[&'self mut Dart<'self, T>],
embed: ~[&'self mut T],
tagged: bool
}
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let mut dart = ~Dart{alpha: ~[], embed: ~[], tagged: false};
let dartRef: &'self mut Dart<'self, T> = dart;
dartRef.alpha = vec::from_elem(self.dim, copy dartRef);
self.darts.push(dart);
}
}
I'm pretty sure that using &mut pointers is impossible, since one can only have one such pointer in existence at a time, e.g.:
fn main() {
let mut i = 0;
let a = &mut i;
let b = &mut i;
}
and-mut.rs:4:12: 4:18 error: cannot borrow `i` as mutable more than once at at a time
and-mut.rs:4 let b = &mut i;
^~~~~~
and-mut.rs:3:12: 3:18 note: second borrow of `i` as mutable occurs here
and-mut.rs:3 let a = &mut i;
^~~~~~
error: aborting due to previous error
One could get around the borrow checker unsafely, by either storing unsafe pointer to the memory (ptr::to_mut_unsafe_ptr), or indices into the darts member of GMap. Essentially, storing a single reference to the memory (in self.darts) and all operations have to go through it.
This might look like:
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let ind = self.darts.len();
self.darts.push(~Dart{alpha: vec::from_elem(self.dim, ind), embed: ~[], tagged: false});
}
}
traverse would need to change to either be a method on GMap (e.g. fn(&mut self, node_ind: uint, invs: &[uint], f: &fn(&Dart<T>))), or at least take a GMap type.
(On an entirely different note, there is library support for external iterators, which are far more composable than the internal iterators (the ones that take a closure). So defining one of these for traverse may (or may not) make using it nicer.)

Resources