With the generous help of the rust community I managed to get the base of a topological data structure assembled using managed pointers. This came together rather nicely and I was pretty excited about Rust in general. Then I read this post (which seems like a reasonable plan) and it inspired me to back track and try to re-assemble it using only owned pointers if possible.
This is the working version using managed pointers:
struct Dart<T> {
alpha: ~[#mut Dart<T>],
embed: ~[#mut T],
tagged: bool
}
impl<T> Dart<T> {
pub fn new(dim: uint) -> #mut Dart<T> {
let mut dart = #mut Dart{alpha: ~[], embed: ~[], tagged: false};
dart.alpha = vec::from_elem(dim, dart);
return dart;
}
pub fn get_dim(&self) -> uint {
return self.alpha.len();
}
pub fn traverse(#mut self, invs: &[uint], f: &fn(&Dart<T>)) {
let dim = self.get_dim();
for invs.each |i| {if *i >= dim {return}}; //test bounds on invs vec
if invs.len() == 2 {
let spread:int = int::abs(invs[1] as int - invs[0] as int);
if spread == 1 { //simple loop
let mut dart = self;
let mut i = invs[0];
while !dart.tagged {
dart.tagged = true;
f(dart);
dart = dart.alpha[i];
if i == invs[0] {i = invs[1];}
else {i == invs[0];}
} }
// else if spread == 2 { // max 4 cells traversed
// }
}
else {
let mut stack = ~[self];
self.tagged = true;
while !stack.is_empty() {
let mut dart = stack.pop();
f(dart);
for invs.each |i| {
if !dart.alpha[*i].tagged {
dart.alpha[*i].tagged = true;
stack.push(dart);
} } } } } }
After a few hours of chasing lifetime errors I have come to the conclusion that this may not even be possible with owned pointers due to the cyclic nature (without tying the knot as I was warned). My feeble attempt at this is below. My question, is this structure possible to implement without resorting to managed pointers? And if not, is the code above considered reasonably "rusty"? (idiomatic rust). Thanks.
struct GMap<'self,T> {
dim: uint,
darts: ~[~Dart<'self,T>]
}
struct Dart<'self,T> {
alpha: ~[&'self mut Dart<'self, T>],
embed: ~[&'self mut T],
tagged: bool
}
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let mut dart = ~Dart{alpha: ~[], embed: ~[], tagged: false};
let dartRef: &'self mut Dart<'self, T> = dart;
dartRef.alpha = vec::from_elem(self.dim, copy dartRef);
self.darts.push(dart);
}
}
I'm pretty sure that using &mut pointers is impossible, since one can only have one such pointer in existence at a time, e.g.:
fn main() {
let mut i = 0;
let a = &mut i;
let b = &mut i;
}
and-mut.rs:4:12: 4:18 error: cannot borrow `i` as mutable more than once at at a time
and-mut.rs:4 let b = &mut i;
^~~~~~
and-mut.rs:3:12: 3:18 note: second borrow of `i` as mutable occurs here
and-mut.rs:3 let a = &mut i;
^~~~~~
error: aborting due to previous error
One could get around the borrow checker unsafely, by either storing unsafe pointer to the memory (ptr::to_mut_unsafe_ptr), or indices into the darts member of GMap. Essentially, storing a single reference to the memory (in self.darts) and all operations have to go through it.
This might look like:
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let ind = self.darts.len();
self.darts.push(~Dart{alpha: vec::from_elem(self.dim, ind), embed: ~[], tagged: false});
}
}
traverse would need to change to either be a method on GMap (e.g. fn(&mut self, node_ind: uint, invs: &[uint], f: &fn(&Dart<T>))), or at least take a GMap type.
(On an entirely different note, there is library support for external iterators, which are far more composable than the internal iterators (the ones that take a closure). So defining one of these for traverse may (or may not) make using it nicer.)
Related
I know that vectors in Rust are allocated on the heap where the pointer, capacity, and length of the vector are stored on the stack.
Let's say I have the following vector:
let vec = vec![1, 2, 3];
If I make an iterator from this vector:
let vec_iter = vec.iter();
How does Rust model this iterator in terms of allocation on the heap vs. stack? Is it the same as the vector?
Most iterators are stack allocated.
In cases like Vec::iter(), they create iterators that have two pointers, one to the end, one to the first element, like so
use std::marker::PhantomData;
pub struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: PhantomData<&'a T>,
}
Since pointer doesn't convey ownership or lifetime, PhantomData<&'a T> tells the compiler this struct holds reference of lifetime 'a to type T
Iter::next looks somewhat like this
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
unsafe {// pointer dereferencing is only allowed in unsafe
if self.ptr == self.end {
None
} else {
let old = self.ptr;
self.ptr = self.ptr.offset(1);
Some(&*old)
}
}
}
}
And a new Iter is created like so
impl<'a, T: 'a> Iter<'a, T> {
pub fn new(slice: &'a [T]) -> Self {
assert_ne!(std::mem::size_of::<T>(), 0); // doesn't handle zero size type
let start = slice.as_ptr();
Iter {
ptr: start,
end: unsafe { start.add(slice.len()) },
_marker: PhantomData,
}
}
}
Now we can use it like any other iterators
let v = vec!['a', 'b', 'c', 'd', 'e'];
for c in Iter::new(&v) {
println!("{c}");
}
And thanks to PhantomData, the compiler can guard us against use after free and other memory issues.
let iter = {
let v = vec!['a', 'b', 'c', 'd', 'e'];
Iter::new(&v) // error! borrowed value doesn't live long enough
};
for c in iter {
println!("{c}");
}
the pointer, capacity, and length of the vector are stored on the stack
→ not really. They are stored wherever the user wishes, which may be on the stack, in the global data segment, or on the heap:
// In the global data segment
static VEC: Vec<()> = Vec::new();
struct Foo {
v: Vec<()>,
}
fn main() {
// On the stack
let v: Vec<()> = Vec::new();
// On the heap
let f = Box::new (Foo { v: Vec::new(), });
}
And the same goes for iterators. Most of the time they simply hold references to the original data, wherever that may be, and the references themselves are stored inside the iterator struct wherever the user puts it.
I'm implementing some recursive code, where function instances deeper down in the call stack may need to refer to data from prior frames. However, I only have non-mut access to those data, so I receive those data as references. As such, I would need to keep references to those data in a stack data structure that can be accessed from the deeper instances.
To illustrate:
// I would like to implement this RefStack class properly, without per-item memory allocations
struct RefStack<T: ?Sized> {
content: Vec<&T>,
}
impl<T: ?Sized> RefStack<T> {
fn new() -> Self { Self{ content: Vec::new() } }
fn get(&self, index: usize) -> &T { self.content[index] }
fn len(&self) -> usize { self.content.len() }
fn with_element<F: FnOnce(&mut Self)>(&mut self, el: &T, f: F) {
self.content.push(el);
f(self);
self.content.pop();
}
}
// This is just an example demonstrating how I would need to use the RefStack class
fn do_recursion(n: usize, node: &LinkedListNode, st: &mut RefStack<str>) {
// get references to one or more items in the stack
// the references should be allowed to live until the end of this function, but shouldn't prevent me from calling with_element() later
let tmp: &str = st.get(rng.gen_range(0, st.len()));
// do stuff with those references (println is just an example)
println!("Item: {}", tmp);
// recurse deeper if necessary
if n > 0 {
let (head, tail): (_, &LinkedListNode) = node.get_parts();
manager.get_str(head, |s: &str| // the actual string is a local variable somewhere in the implementation details of get_str()
st.with_element(s, |st| do_recursion(n - 1, tail, st))
);
}
// do more stuff with those references (println is just an example)
println!("Item: {}", tmp);
}
fn main() {
do_recursion(100, list /* gotten from somewhere else */, &mut RefStack::new());
}
In the example above, I'm concerned about how to implement RefStack without any per-item memory allocations. The occasional allocations by the Vec is acceptable - those are few and far in between. The LinkedListNode is just an example - in practice it's some complicated graph data structure, but the same thing applies - I only have a non-mut reference to it, and the closure given to manager.get_str() only provides a non-mut str. Note that the non-mut str passed into the closure may only be constructed in the get_str() implementation, so we cannot assume that all the &str have the same lifetime.
I'm fairly certain that RefStack can't be implemented in safe Rust without copying out the str into owned Strings, so my question is how this can be done in unsafe Rust. It feels like I might be able to get a solution such that:
The unsafeness is confined to the implementation of RefStack
The reference returned by st.get() should live at least as long as the current instance of the do_recursion function (in particular, it should be able to live past the call to st.with_element(), and this is logically safe since the &T that is returned by st.get() isn't referring to any memory owned by the RefStack anyway)
How can such a struct be implemented in (unsafe) Rust?
It feels that I could just cast the element references to pointers and store them as pointers, but I will still face difficulties expressing the requirement in the second bullet point above when casting them back to references. Or is there a better way (or by any chance is such a struct implementable in safe Rust, or already in some library somewhere)?
I think storing raw pointer is the way to go. You need a PhantomData to store the lifetime and get proper covariance:
use std::marker::PhantomData;
struct RefStack<'a, T: ?Sized> {
content: Vec<*const T>,
_pd: PhantomData<&'a T>,
}
impl<'a, T: ?Sized> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),_pd: PhantomData
}
}
fn get(&self, index: usize) -> &'a T {
unsafe { &*self.content[index] }
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: FnOnce(&mut RefStack<'t, T>)>(&mut self, el: &'t T, f: F)
where 'a: 't,
{
self.content.push(el);
let mut tmp = RefStack {
content: std::mem::take(&mut self.content),
_pd: PhantomData,
};
f(&mut tmp);
self.content = tmp.content;
self.content.pop();
}
}
(Playground)
The only unsafe code is in converting the pointer back into a reference.
The tricky part is getting the with_element right. I think that the were 'a: 't is implicit, because the whole impl depends on it (but better safe than sorry).
The last problem is how to convert a RefStack<'a, T> into a RefStack<'t, T>. I'm pretty sure I could just std::transmute it. But that would an extra unsafe to pay attention to, and creating a new temporary stack one is quite trivial.
About the 't lifetime
You may think that this 't lifetime is not actually needed, but not adding it may cause subtle unsoundness, as the callback could call get() and get values with a lifetime 'a that is actually longer than the inserted value.
For example this code should not compile. With the 't it correctly fails, but without it it compiles and causes undefined behavior:
fn breaking<'a, 's, 'x>(st: &'s mut RefStack<'a, i32>, v: &'x mut Vec<&'a i32>) {
v.push(st.get(0));
}
fn main() {
let mut st = RefStack::<i32>::new();
let mut y = Vec::new();
{
let i = 42;
st.with_element(&i, |stack| breaking(stack, &mut y));
}
println!("{:?}", y);
}
About panic!.
When doing these kind of unsafe things, particularly when you are calling user code, as we are doing now in with_element, we have to consider what would happen if it panics. In the OP code, the last object will not be popped, and when the stack is unwound, any drop implementation could see the now dangling reference. My code is ok in case of panics because, if f(&mut tmp); the dangling references die in the local temporary tmp while self.content is empty.
Disclaimer: this answer originally used traits, and it was a nightmare; Francis Gagne pointed out rightly that using an Option for the tail was a much better alternative, hence the answer was much simplified.
Given the structure of your usage, with the stack in RefStack following the usage of the stack frames, you can simply put elements on the stack frames and build a stack from that.
The main advantage of such an approach is that it is entirely safe. You can review the whole code here, or follow for the blow by blow description which follows.
The key is idea is to build a so-called cons-list.
#[derive(Debug)]
struct Stack<'a, T> {
element: &'a T,
tail: Option<&'a Stack<'a, T>>,
}
impl<'a, T> Stack<'a, T> {
fn new(element: &'a T) -> Self { Stack { element, tail: None } }
fn top(&self) -> &T { self.element }
fn get(&self, index: usize) -> Option<&T> {
if index == 0 {
Some(self.element)
} else {
self.tail.and_then(|tail| tail.get(index - 1))
}
}
fn tail(&self) -> Option<&'a Stack<'a, T>> { self.tail }
fn push<'b>(&'b self, element: &'b T) -> Stack<'b, T> { Stack { element, tail: Some(self) } }
}
A simple example of usage is:
fn immediate() {
let (a, b, c) = (0, 1, 2);
let root = Stack::new(&a);
let middle = root.push(&b);
let top = middle.push(&c);
println!("{:?}", top);
}
Which just prints the stack, yielding:
Stack { element: 2, tail: Some(Stack { element: 1, tail: Some(Stack { element: 0, tail: None }) }) }
And a more elaborate recursive version:
fn recursive(n: usize) {
fn inner(n: usize, stack: &Stack<'_, i32>) {
if n == 0 {
print!("{:?}", stack);
return;
}
let element = n as i32;
let stacked = stack.push(&element);
inner(n - 1, &stacked);
}
if n == 0 {
println!("()");
return;
}
let element = n as i32;
let root = Stack::new(&element);
inner(n - 1, &root);
}
Which prints:
Stack { element: 1, tail: Some(Stack { element: 2, tail: Some(Stack { element: 3, tail: None }) }) }
The one downside is that get performance may not be so good; it has linear complexity. On the other hand, cache-wise sticking to the stack frames is pretty nice. If you mostly access the first few elements, I expect it'll be good enough.
Disclaimer: A different answer; with a different trade-off.
Compared to my other answer, this answer presents a solution that is:
unsafe: it's encapsulated, but subtle.
simpler to use.
simpler code, likely faster.
The idea is to still use the stack to bind the lifetimes of the references, yet storing all lifetimes in a single Vec for O(1) random access. So we're building a stack on the stack, but not storing the references themselves on the stack. Alright?
The full code is available here.
The stack itself is very easily defined:
struct StackRoot<T: ?Sized>(Vec<*const T>);
struct Stack<'a, T: ?Sized>{
len: usize,
stack: &'a mut Vec<*const T>,
}
impl<T: ?Sized> StackRoot<T> {
fn new() -> Self { Self(vec!()) }
fn stack(&mut self) -> Stack<'_, T> { Stack { len: 0, stack: &mut self.0 } }
}
The implementation of Stack is trickier, as always when unsafe is involved:
impl<'a, T: ?Sized> Stack<'a, T> {
fn len(&self) -> usize { self.len }
fn get(&self, index: usize) -> Option<&'a T> {
if index < self.len {
// Safety:
// - Index is bounds as per above branch.
// - Lifetime of reference is guaranteed to be at least 'a (see push).
Some(unsafe { &**self.stack.get_unchecked(index) })
} else {
None
}
}
fn push<'b>(&'b mut self, element: &'b T) -> Stack<'b, T>
where
'a: 'b
{
// Stacks could have been built and forgotten, resulting in `self.stack`
// containing references to further elements, so that the newly pushed
// element would not be at index `self.len`, as expected.
//
// Note that on top of being functionally important, it's also a safety
// requirement: `self` should never be able to access elements that are
// not guaranteed to have a lifetime longer than `'a`.
self.stack.truncate(self.len);
self.stack.push(element as *const _);
Stack { len: self.len + 1, stack: &mut *self.stack }
}
}
impl<'a, T: ?Sized> Drop for Stack<'a, T> {
fn drop(&mut self) {
self.stack.truncate(self.len);
}
}
Do note the unsafe here; the invariant is that 'a parameter is always stricter that the lifetimes of the elements pushed into the stack so far.
By refusing to access elements pushed by other members, we thus guarantee that the lifetime of the returned reference is valid.
It does require a generic definition of do_recursion, however generic lifetime parameters are erased at code generation, so there's no code bloat involved:
fn do_recursion<'a, 'b>(nodes: &[&'a str], stack: &mut Stack<'b, str>)
where
'a: 'b
{
let tmp: &str = stack.get(stack.len() - 1).expect("Not empty");
println!("{:?}", tmp);
if let [head, tail # ..] = nodes {
let mut new = stack.push(head);
do_recursion(tail, &mut new);
}
}
A simple main to show it off:
fn main() {
let nodes = ["Hello", ",", "World", "!"];
let mut root = StackRoot::new();
let mut stack = root.stack();
let mut stack = stack.push(nodes[0]);
do_recursion(&nodes[1..], &mut stack);
}
Resulting in:
"Hello"
","
"World"
"!"
Based on rodrigo's answer, I implemented this slightly simpler version:
struct RefStack<'a, T: ?Sized + 'static> {
content: Vec<&'a T>,
}
impl<'a, T: ?Sized + 'static> RefStack<'a, T> {
fn new() -> Self {
RefStack {
content: Vec::new(),
}
}
fn get(&self, index: usize) -> &'a T {
self.content[index]
}
fn len(&self) -> usize {
self.content.len()
}
fn with_element<'t, F: >(&mut self, el: &'t T, f: F)
where
F: FnOnce(&mut RefStack<'t, T>),
'a: 't,
{
let mut st = RefStack {
content: std::mem::take(&mut self.content),
};
st.content.push(el);
f(&mut st);
st.content.pop();
self.content = unsafe { std::mem::transmute(st.content) };
}
}
The only difference to rodrigo's solution is that the vector is represented as vector of references instead of pointers, so we don't need the PhantomData and the unsafe code to access an element.
When a new element is pushed to the stack in with_element(), we require that it has a shorter lifetime than the existing elements with the a': t' bound. We then create a new stack with the shorter lifetime, which is possible in safe code since we know the data the references in the vector are pointing to even lives for the longer lifetime 'a. We then push the new element with lifetime 't to the new vector, again in safe code, and only after we removed that element again we move the vector back in it's original place. This requires unsafe code since we are extending the lifetime of the references in the vector from 't to 'a this time. We know this is safe, since the vector is back to its original state, but the compiler doesn't know this.
I feel this version represents the intent better than rodrigo's almost identical version. The type of the vector always is "correct", in that it desribes that the elements are actually references, not raw pointers, and it always assigns the correct lifetime to the vector. And we use unsafe code exactly in the place where something potentially unsafe happens – when extending the lifetime of the references in the vector.
I'm filling a vector in parallel, but for this generalized question, I've only found hints and no answers.
The code below works, but I want to switch to Rng::fill instead of iterating over each chunk. It might not be possible to have multiple mutable slices inside a single Vec; I'm not sure.
extern crate rayon;
extern crate rand;
extern crate rand_xoshiro;
use rand::{Rng, SeedableRng};
use rand_xoshiro::Xoshiro256StarStar;
use rayon::prelude::*;
use std::{iter, env};
use std::sync::{Arc, Mutex};
// i16 because I was filling up my RAM for large tests...
fn gen_rand_vec(data: &mut [i16]) {
let num_threads = rayon::current_num_threads();
let mut rng = rand::thread_rng();
let mut prng = Xoshiro256StarStar::from_rng(&mut rng).unwrap();
// lazy iterator of fast, unique RNGs
// Arc and Mutex are just so it can be accessed from multiple threads
let rng_it = Arc::new(Mutex::new(iter::repeat(()).map(|()| {
let new_prng = prng.clone();
prng.jump();
new_prng
})));
// generates random numbers for each chunk in parallel
// par_chunks_mut is parallel version of chunks_mut
data.par_chunks_mut(data.len() / num_threads).for_each(|chunk| {
// I used extra braces because it might be required to unlock Mutex.
// Not sure.
let mut prng = { rng_it.lock().unwrap().next().unwrap() };
for i in chunk.iter_mut() {
*i = prng.gen_range(-1024, 1024);
}
});
}
It turns out that a ChunksMut iterator gives slices. I'm not sure how to glean that from the documentation. I figured it out by reading the source:
#[derive(Debug)]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct ChunksMut<'a, T:'a> {
v: &'a mut [T],
chunk_size: usize
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, T> Iterator for ChunksMut<'a, T> {
type Item = &'a mut [T];
#[inline]
fn next(&mut self) -> Option<&'a mut [T]> {
if self.v.is_empty() {
None
} else {
let sz = cmp::min(self.v.len(), self.chunk_size);
let tmp = mem::replace(&mut self.v, &mut []);
let (head, tail) = tmp.split_at_mut(sz);
self.v = tail;
Some(head)
}
}
I guess it's just experience; to others it must be obvious that an iterator of type ChunksMut<T> from Vec<T> returns objects of type [T]. That makes sense now. It just wasn't very clear with the intermediate struct.
pub fn chunks_mut(&mut self, chunk_size: usize) -> ChunksMut<T>
// ...
impl<'a, T> Iterator for ChunksMut<'a, T>
Reading this, it looked like the iterator returned objects of type T, the same as Vec<T>.iter(), which wouldn't make sense.
This is a very simple example, but how would I do something similar to:
let fact = |x: u32| {
match x {
0 => 1,
_ => x * fact(x - 1),
}
};
I know that this specific example can be easily done with iteration, but I'm wondering if it's possible to make a recursive function in Rust for more complicated things (such as traversing trees) or if I'm required to use my own stack instead.
There are a few ways to do this.
You can put closures into a struct and pass this struct to the closure. You can even define structs inline in a function:
fn main() {
struct Fact<'s> { f: &'s dyn Fn(&Fact, u32) -> u32 }
let fact = Fact {
f: &|fact, x| if x == 0 {1} else {x * (fact.f)(fact, x - 1)}
};
println!("{}", (fact.f)(&fact, 5));
}
This gets around the problem of having an infinite type (a function that takes itself as an argument) and the problem that fact isn't yet defined inside the closure itself when one writes let fact = |x| {...} and so one can't refer to it there.
Another option is to just write a recursive function as a fn item, which can also be defined inline in a function:
fn main() {
fn fact(x: u32) -> u32 { if x == 0 {1} else {x * fact(x - 1)} }
println!("{}", fact(5));
}
This works fine if you don't need to capture anything from the environment.
One more option is to use the fn item solution but explicitly pass the args/environment you want.
fn main() {
struct FactEnv { base_case: u32 }
fn fact(env: &FactEnv, x: u32) -> u32 {
if x == 0 {env.base_case} else {x * fact(env, x - 1)}
}
let env = FactEnv { base_case: 1 };
println!("{}", fact(&env, 5));
}
All of these work with Rust 1.17 and have probably worked since version 0.6. The fn's defined inside fns are no different to those defined at the top level, except they are only accessible within the fn they are defined inside.
As of Rust 1.62 (July 2022), there's still no direct way to recurse in a closure. As the other answers have pointed out, you need at least a bit of indirection, like passing the closure to itself as an argument, or moving it into a cell after creating it. These things can work, but in my opinion they're kind of gross, and they're definitely hard for Rust beginners to follow. If you want to use recursion but you have to have a closure, for example because you need something that implements FnOnce() to use with thread::spawn, then I think the cleanest approach is to use a regular fn function for the recursive part and to wrap it in a non-recursive closure that captures the environment. Here's an example:
let x = 5;
let fact = || {
fn helper(arg: u64) -> u64 {
match arg {
0 => 1,
_ => arg * helper(arg - 1),
}
}
helper(x)
};
assert_eq!(120, fact());
Here's a really ugly and verbose solution I came up with:
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
fn main() {
let weak_holder: Rc<RefCell<Weak<dyn Fn(u32) -> u32>>> =
Rc::new(RefCell::new(Weak::<fn(u32) -> u32>::new()));
let weak_holder2 = weak_holder.clone();
let fact: Rc<dyn Fn(u32) -> u32> = Rc::new(move |x| {
let fact = weak_holder2.borrow().upgrade().unwrap();
if x == 0 {
1
} else {
x * fact(x - 1)
}
});
weak_holder.replace(Rc::downgrade(&fact));
println!("{}", fact(5)); // prints "120"
println!("{}", fact(6)); // prints "720"
}
The advantages of this are that you call the function with the expected signature (no extra arguments needed), it's a closure that can capture variables (by move), it doesn't require defining any new structs, and the closure can be returned from the function or otherwise stored in a place that outlives the scope where it was created (as an Rc<Fn...>) and it still works.
Closure is just a struct with additional contexts. Therefore, you can do this to achieve recursion (suppose you want to do factorial with recursive mutable sum):
#[derive(Default)]
struct Fact {
ans: i32,
}
impl Fact {
fn call(&mut self, n: i32) -> i32 {
if n == 0 {
self.ans = 1;
return 1;
}
self.call(n - 1);
self.ans *= n;
self.ans
}
}
To use this struct, just:
let mut fact = Fact::default();
let ans = fact.call(5);
This is a very simple example, but how would I do something similar to:
let fact = |x: u32| {
match x {
0 => 1,
_ => x * fact(x - 1),
}
};
I know that this specific example can be easily done with iteration, but I'm wondering if it's possible to make a recursive function in Rust for more complicated things (such as traversing trees) or if I'm required to use my own stack instead.
There are a few ways to do this.
You can put closures into a struct and pass this struct to the closure. You can even define structs inline in a function:
fn main() {
struct Fact<'s> { f: &'s dyn Fn(&Fact, u32) -> u32 }
let fact = Fact {
f: &|fact, x| if x == 0 {1} else {x * (fact.f)(fact, x - 1)}
};
println!("{}", (fact.f)(&fact, 5));
}
This gets around the problem of having an infinite type (a function that takes itself as an argument) and the problem that fact isn't yet defined inside the closure itself when one writes let fact = |x| {...} and so one can't refer to it there.
Another option is to just write a recursive function as a fn item, which can also be defined inline in a function:
fn main() {
fn fact(x: u32) -> u32 { if x == 0 {1} else {x * fact(x - 1)} }
println!("{}", fact(5));
}
This works fine if you don't need to capture anything from the environment.
One more option is to use the fn item solution but explicitly pass the args/environment you want.
fn main() {
struct FactEnv { base_case: u32 }
fn fact(env: &FactEnv, x: u32) -> u32 {
if x == 0 {env.base_case} else {x * fact(env, x - 1)}
}
let env = FactEnv { base_case: 1 };
println!("{}", fact(&env, 5));
}
All of these work with Rust 1.17 and have probably worked since version 0.6. The fn's defined inside fns are no different to those defined at the top level, except they are only accessible within the fn they are defined inside.
As of Rust 1.62 (July 2022), there's still no direct way to recurse in a closure. As the other answers have pointed out, you need at least a bit of indirection, like passing the closure to itself as an argument, or moving it into a cell after creating it. These things can work, but in my opinion they're kind of gross, and they're definitely hard for Rust beginners to follow. If you want to use recursion but you have to have a closure, for example because you need something that implements FnOnce() to use with thread::spawn, then I think the cleanest approach is to use a regular fn function for the recursive part and to wrap it in a non-recursive closure that captures the environment. Here's an example:
let x = 5;
let fact = || {
fn helper(arg: u64) -> u64 {
match arg {
0 => 1,
_ => arg * helper(arg - 1),
}
}
helper(x)
};
assert_eq!(120, fact());
Here's a really ugly and verbose solution I came up with:
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
fn main() {
let weak_holder: Rc<RefCell<Weak<dyn Fn(u32) -> u32>>> =
Rc::new(RefCell::new(Weak::<fn(u32) -> u32>::new()));
let weak_holder2 = weak_holder.clone();
let fact: Rc<dyn Fn(u32) -> u32> = Rc::new(move |x| {
let fact = weak_holder2.borrow().upgrade().unwrap();
if x == 0 {
1
} else {
x * fact(x - 1)
}
});
weak_holder.replace(Rc::downgrade(&fact));
println!("{}", fact(5)); // prints "120"
println!("{}", fact(6)); // prints "720"
}
The advantages of this are that you call the function with the expected signature (no extra arguments needed), it's a closure that can capture variables (by move), it doesn't require defining any new structs, and the closure can be returned from the function or otherwise stored in a place that outlives the scope where it was created (as an Rc<Fn...>) and it still works.
Closure is just a struct with additional contexts. Therefore, you can do this to achieve recursion (suppose you want to do factorial with recursive mutable sum):
#[derive(Default)]
struct Fact {
ans: i32,
}
impl Fact {
fn call(&mut self, n: i32) -> i32 {
if n == 0 {
self.ans = 1;
return 1;
}
self.call(n - 1);
self.ans *= n;
self.ans
}
}
To use this struct, just:
let mut fact = Fact::default();
let ans = fact.call(5);