Parallel Recursion Fix - recursion

Quite new to Rust and trying to tackle toy problems. Trying to write a directory traversal with only Rayon.
struct Node {
path: PathBuf,
files: Vec<PathBuf>,
hashes: Vec<String>,
folders: Vec<Box<Node>>,
}
impl Node {
pub fn new(path: PathBuf) -> Self {
Node {
path: path,
files: Vec::new(),
hashes: Vec::new(),
folders: Vec::new(),
}
}
pub fn burrow(&mut self) {
let mut contents: Vec<PathBuf> = ls_dir(&self.path);
contents.par_iter().for_each(|item|
if item.is_file() {
self.files.push(*item);
} else if item.is_dir() {
let mut new_folder = Node::new(*item);
new_folder.burrow();
self.folders.push(Box::new(new_folder));
});
}
}
The errors I am receiving are
error[E0596]: cannot borrow `*self.files` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:40:37
|
40 | ... self.files.push(*item);
| ^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0507]: cannot move out of `*item` which is behind a shared reference
--> src/main.rs:40:53
|
40 | ... self.files.push(*item);
| ^^^^^ move occurs because `*item` has type `PathBuf`, which does not implement the `Copy` trait
error[E0507]: cannot move out of `*item` which is behind a shared reference
--> src/main.rs:42:68
|
42 | ... let mut new_folder = Node::new(*item);
| ^^^^^ move occurs because `*item` has type `PathBuf`, which does not implement the `Copy` trait
error[E0596]: cannot borrow `*self.folders` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:44:37
|
44 | ... self.folders.push(Box::new(new_folder));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
The errors are clear in that they are preventing different threads from accessing mutable memory, but I'm just not sure how to start to address the errors.
Below is the original (non-parallel) version of burrow
pub fn burrow(&mut self) {
let mut contents: Vec<PathBuf> = ls_dir(&self.path);
for item in contents {
if item.is_file() {
self.files.push(item);
} else if item.is_dir() {
let mut new_folder = Node::new(item);
new_folder.burrow();
self.folders.push(Box::new(new_folder));
}
}
}

The best option in this case is to use ParallelIterator::partition_map() which allows you to turn a parallel iterator into two different collections according to some condition, which is exactly what you need to do.
Example program:
use rayon::iter::{Either, IntoParallelIterator, ParallelIterator};
fn main() {
let input = vec!["a", "bb", "c", "dd"];
let (chars, strings): (Vec<char>, Vec<&str>) =
input.into_par_iter().partition_map(|s| {
if s.len() == 1 {
Either::Left(s.chars().next().unwrap())
} else {
Either::Right(s)
}
});
dbg!(chars, strings);
}
If you had three different outputs, unfortunately Rayon does not support that. I haven't looked at whether it'd be possible to build using Rayon's traits, but what I would suggest as a more general (though not quite as efficient) solution is to use channels. A channel like std::sync::mpsc allows any number of threads to insert items while another thread removes them — in your case, to move them into a collection. This would not be quite as efficient as parallel collection, but in an IO-dominated problem like yours, it would not be significant.

I'm going to skip the separation of files and folders, ignore the structure, and demonstrate a simple recursive approach that gets all the files in a directory recursively:
fn burrow(dir: &Path) -> Vec<PathBuf> {
let mut contents = vec![];
for entry in std::fs::read_dir(dir).unwrap() {
let entry = entry.unwrap().path();
if entry.is_dir() {
contents.extend(burrow(&entry));
} else {
contents.push(entry);
}
}
contents
}
The first step if you want to use the parallel iterators from rayon, is to convert this loop into a non-parallel iterator chain. The best way to do that is with .flat_map() to flatten results that yield more than one element:
fn burrow(dir: &Path) -> Vec<PathBuf> {
std::fs::read_dir(dir)
.unwrap()
.flat_map(|entry| {
let entry = entry.unwrap().path();
if entry.is_dir() {
burrow(&entry)
} else {
vec![entry] // use a single-element Vec if not a directory
}
})
.collect()
}
Then to use rayon to process this iteration in parallel is to use .par_bridge() to convert an iterator into a parallel iterator. And that's it actually:
use rayon::iter::{ParallelBridge, ParallelIterator};
fn burrow(dir: &Path) -> Vec<PathBuf> {
std::fs::read_dir(dir)
.unwrap()
.par_bridge()
.flat_map(|entry| {
let entry = entry.unwrap().path();
if entry.is_dir() {
burrow(&entry)
} else {
vec![entry]
}
})
.collect()
}
See it working on the playground. You can extend on this to collect more complex results (like folders and hashes and whatever else).

Related

How to implement a Future or Stream that polls an async fn?

I have a struct Test I want to implement std::future::Future that would poll function:
use std::{
future::Future,
pin::Pin,
task::{Context, Poll},
};
struct Test;
impl Test {
async fn function(&mut self) {}
}
impl Future for Test {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
match self.function() {
Poll::Pending => Poll::Pending,
Poll::Ready(_) => Poll::Ready(()),
}
}
}
That didn't work:
error[E0308]: mismatched types
--> src/lib.rs:17:13
|
10 | async fn function(&mut self) {}
| - the `Output` of this `async fn`'s expected opaque type
...
17 | Poll::Pending => Poll::Pending,
| ^^^^^^^^^^^^^ expected opaque type, found enum `Poll`
|
= note: expected opaque type `impl Future`
found enum `Poll<_>`
error[E0308]: mismatched types
--> src/lib.rs:18:13
|
10 | async fn function(&mut self) {}
| - the `Output` of this `async fn`'s expected opaque type
...
18 | Poll::Ready(_) => Poll::Ready(()),
| ^^^^^^^^^^^^^^ expected opaque type, found enum `Poll`
|
= note: expected opaque type `impl Future`
found enum `Poll<_>`
I understand that function must be called once, the returned Future must be stored somewhere in the struct, and then the saved future must be polled. I tried this:
struct Test(Option<Box<Pin<dyn Future<Output = ()>>>>);
impl Test {
async fn function(&mut self) {}
fn new() -> Self {
let mut s = Self(None);
s.0 = Some(Box::pin(s.function()));
s
}
}
That also didn't work:
error[E0277]: the size for values of type `(dyn Future<Output = ()> + 'static)` cannot be known at compilation time
--> src/lib.rs:7:13
|
7 | struct Test(Option<Box<Pin<dyn Future<Output = ()>>>>);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `(dyn Future<Output = ()> + 'static)`
After I call function() I have taken a &mut reference of Test, because of that I can't change the Test variable, and therefore can't store the returned Future inside the Test.
I did get an unsafe solution (inspired by this)
struct Test<'a>(Option<BoxFuture<'a, ()>>);
impl Test<'_> {
async fn function(&mut self) {
println!("I'm alive!");
}
fn new() -> Self {
let mut s = Self(None);
s.0 = Some(unsafe { &mut *(&mut s as *mut Self) }.function().boxed());
s
}
}
impl Future for Test<'_> {
type Output = ();
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
self.0.as_mut().unwrap().poll_unpin(cx)
}
}
I hope that there is another way.
Though there are times when you may want to do things similar to what you're trying to accomplish here, they are a rarity. So most people reading this, maybe even OP, may wish to restructure such that struct state and data used for a single async execution are different objects.
To answer your question, yes it is somewhat possible. Unless you want to absolutely resort to unsafe code you will need to use Mutex and Arc. All fields you wish to manipulate inside the async fn will have to be wrapped inside a Mutex and the function itself will accept an Arc<Self>.
I must stress, however, that this is not a beautiful solution and you probably don't want to do this. Depending on your specific case your solution may vary, but my guess of what OP is trying to accomplish while using Streams would be better solved by something similar to this gist that I wrote.
use std::{
future::Future,
pin::Pin,
sync::{Arc, Mutex},
};
struct Test {
state: Mutex<Option<Pin<Box<dyn Future<Output = ()>>>>>,
// if available use your async library's Mutex to `.await` locks on `buffer` instead
buffer: Mutex<Vec<u8>>,
}
impl Test {
async fn function(self: Arc<Self>) {
for i in 0..16u8 {
let data: Vec<u8> = vec![i]; // = fs::read(&format("file-{}.txt", i)).await.unwrap();
let mut buflock = self.buffer.lock().unwrap();
buflock.extend_from_slice(&data);
}
}
pub fn new() -> Arc<Self> {
let s = Arc::new(Self {
state: Default::default(),
buffer: Default::default(),
});
{
// start by trying to aquire a lock to the Mutex of the Box
let mut lock = s.state.lock().unwrap();
// create boxed future
let b = Box::pin(s.clone().function());
// insert value into the mutex
*lock = Some(b);
} // block causes the lock to be released
s
}
}
impl Future for Test {
type Output = ();
fn poll(
self: std::pin::Pin<&mut Self>,
ctx: &mut std::task::Context<'_>,
) -> std::task::Poll<<Self as std::future::Future>::Output> {
let mut lock = self.state.lock().unwrap();
let fut: &mut Pin<Box<dyn Future<Output = ()>>> = lock.as_mut().unwrap();
Future::poll(fut.as_mut(), ctx)
}
}
I'm not sure what you want to achieve and why, but I suspect that you're trying to implement Future for Test based on some ancient tutorial or misunderstanding and just overcomplicating things.
You don't have to implement Future manually. An async function
async fn function(...) {...}
is really just syntax sugar translated behind the scenes into something like
fn function(...) -> Future<()> {...}
All you have to do is to use the result of the function the same way as any future, e.g. use await on it or call block a reactor until it's finished. E.g. based on your first version, you can simply call:
let mut test = Test{};
test.function().await;
UPDATE1
Based on your descriptions I still think you're trying to overcomplicate this minimal working snippet without the need to manually implement Future for anything:
async fn asyncio() { println!("Doing async IO"); }
struct Test {
count: u32,
}
impl Test {
async fn function(&mut self) {
asyncio().await;
self.count += 1;
}
}
#[tokio::main]
async fn main() {
let mut test = Test{count: 0};
test.function().await;
println!("Count: {}", test.count);
}

Why does a node in a linked list using raw pointers become corrupted?

I am struggling to learn raw pointers while implementing a linked list. A simple piece of code gives me unintended results for which I struggle to find any explanation whatsoever:
use std::cmp::PartialEq;
use std::default::Default;
use std::ptr;
pub struct LinkedListElement<T> {
pub data: T,
pub next: *mut LinkedListElement<T>,
}
pub struct LinkedList<T> {
head: *mut LinkedListElement<T>,
}
impl<T: PartialEq> LinkedListElement<T> {
pub fn new(elem: T, next: Option<*mut LinkedListElement<T>>) -> LinkedListElement<T> {
let mut_ptr = match next {
Some(t) => t,
None => ptr::null_mut(),
};
let new_elem = LinkedListElement {
data: elem,
next: mut_ptr,
};
if !mut_ptr.is_null() {
println!(
"post create ll mut ptr: {:p}, post create ll mut ptr next {:p}",
mut_ptr,
unsafe { (*mut_ptr).next }
);
}
new_elem
}
}
impl<T: PartialEq + Default> LinkedList<T> {
pub fn new(elem: T) -> LinkedList<T> {
LinkedList {
head: &mut LinkedListElement::new(elem, None),
}
}
pub fn insert(&mut self, elem: T) {
println!("head: {:p} . next: {:p}", self.head, unsafe {
(*self.head).next
});
let next = Some(self.head);
let mut ll_elem = LinkedListElement::new(elem, next);
println!(
"before pointer head: {:p}. before pointer next {:p}",
self.head,
unsafe { (*self.head).next }
);
let ll_elem_ptr = &mut ll_elem as *mut LinkedListElement<T>;
self.head = ll_elem_ptr;
}
}
fn main() {
let elem: i32 = 32;
let second_elem: i32 = 64;
let third_elem: i32 = 72;
let mut list = LinkedList::new(elem);
list.insert(second_elem);
list.insert(third_elem);
}
(playground)
This code gives me the following output:
head: 0x7ffe163275e8 . next: 0x0
post create ll mut ptr: 0x7ffe163275e8, post create ll mut ptr next 0x0
before pointer head: 0x7ffe163275e8. before pointer next 0x0
head: 0x7ffe16327560 . next: 0x7ffe163275e8
post create ll mut ptr: 0x7ffe16327560, post create ll mut ptr next 0x7ffe163275e8
before pointer head: 0x7ffe16327560. before pointer next 0x7ffe16327560
For the first 2 elements the code behaves as expected: it creates an element with null pointer as its next element. Here is the state of things after adding second element:
{
head: {
elem: 64,
next: {
elem: 32,
next: nullptr
}
}
}
64 -> 32 -> null
When the third element is added, things become weird and the linked list transforms into something like this:
{
head: {
elem: 72,
next: {
elem: 72,
next: {
elem: 72,
next: ...
}
}
}
}
72 -> 72 -> 72 -> ...
It seems that the linked list element's next field starts pointing at the element itself.
I have debugged the LinkedListElement::new method and found that the proper element should get returned from it:
{
elem: 72,
next: {
elem: 64,
next: {
elem: 32,
next: nullptr
}
}
}
For some reason, immediately after it is returned to LinkedList::insert method, even before self.head is reassigned, the contents of LinkedList self becomes "corrupted".
I know using raw pointers in Rust is not idiomatic but I still want to learn them.
Congratulations, you have successfully proven why Rust needs to exist in the first place: programmers write memory-unsafe code.
First, please read why this is disallowed when using safe Rust:
Is there any way to return a reference to a variable created in a function?
TL;DR: the memory address of LinkedListElement changes when it's moved. A move occurs when a value is returned from a function (among other times). By using a raw pointer, you've subverted the borrow checker and get no useful feedback from the compiler.
Second, please read Learning Rust With Entirely Too Many Linked Lists. For whatever reason, programmers think that linked lists are "easy" and a good way to learn a language. This is generally not true in Rust, where memory safety is paramount.
TL;DR: you can use a Box to allocate memory on the heap. This memory address will not change when the pointer to it is moved. You will need to ensure that you appropriately free the pointer when your linked list goes out of scope to prevent memory leaks.
See also:
How to copy a raw pointer when implementing a linked list in Rust?
Box::into_raw / Box::from_raw
NonNull

Can't use Vec two times and I can't borrow it instead

I attempted to implement the Rosetta Code password generator:
extern crate rand;
use rand::prelude::*;
fn main() {
println!("Hello, world!");
let p = generate_password(12, 5);
for i in p.iter() {
println!("{:?}", i);
}
}
fn generate_password(length: i32, number: i32) -> Vec<Vec<String>> {
let lowercase = "abcdefghijklmnopqrstuvwxyz";
let uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
let listnumber = "0123456789";
let other = "!\\\"#$%&'()*+,-./:;<=>?#[]^_{|}~";
let all: Vec<char> = String::from(format!("{}{}{}{}", lowercase, uppercase, listnumber, other))
.chars()
.collect();
let mut password: Vec<String> = Vec::new();
let mut password_list: Vec<Vec<String>> = Vec::new();
for num in 1..number {
for l in 1..length {
password.push(String::from(thread_rng().choose(&all).unwrap().to_string()));
}
password_list.push(&password);
}
return password_list;
}
Rust won't allow me to use either borrowed value or direct value:
error[E0308]: mismatched types
--> src/main.rs:26:28
|
26 | password_list.push(&password);
| ^^^^^^^^^
| |
| expected struct `std::vec::Vec`, found reference
| help: consider removing the borrow: `password`
|
= note: expected type `std::vec::Vec<std::string::String>`
found type `&std::vec::Vec<std::string::String>`
The help message says I should remove the borrow because of type mismatch but it's still got an error after removing it because the value has been moved.
You've declared a type to be Vec<Vec<String>>, but you're trying to store a reference inside it.
When you remove the reference, you're getting a different error because push takes ownership of the value, so the original variable can no longer be used. But you then try to use it in the subsequent loop. The easy fix is to declare the variable inside the loop, so it is a new variable each time:
let mut password_list = Vec::new();
for num in 1..number {
let mut password = Vec::new();
for l in 1..length {
password.push(String::from(thread_rng().choose(&all).unwrap().to_string()));
}
password_list.push(password);
}
Note that you don't need a lot of the type annotations, especially on local function variables. The compiler can infer them, which makes the code a lot cleaner.

How do you replace the value of a mutable variable by taking ownership of it?

I am working with a LinkedList and I want to remove all elements which do not pass a test. However, I am running into the error cannot move out of borrowed content.
From what I understand, this is because I am working with &mut self, so I do not have the right to invalidate (i.e. move) one of the contained values even for a moment to construct a new list of its values.
In C++/Java, I would simply iterate the list and remove any elements which match a criteria. As there is no remove that I have yet found, I have interpreted it as an iterate, filter, and collect.
The goal is to avoid creating a temporary list, cloning values, and needing take self and return a "new" object. I have constructed an example which produces the same error. Playground.
use std::collections::LinkedList;
#[derive(Debug)]
struct Example {
list: LinkedList<i8>,
// Other stuff here
}
impl Example {
pub fn default() -> Example {
let mut list = LinkedList::new();
list.push_back(-5);
list.push_back(3);
list.push_back(-1);
list.push_back(6);
Example { list }
}
// Simmilar idea, but with creating a new list
pub fn get_positive(&self) -> LinkedList<i8> {
self.list.iter()
.filter(|&&x| x > 0)
.map(|x| x.clone())
.collect()
}
// Now, attempt to filter the elements without cloning anything
pub fn remove_negative(&mut self) {
self.list = self.list.into_iter()
.filter(|&x| x > 0)
.collect()
}
}
fn main() {
let mut e = Example::default();
println!("{:?}", e.get_positive());
println!("{:?}", e);
}
In my actual case, I cannot simply consume the wrapping object because it needs to be referenced from different places and contains other important values.
In my research, I found some unsafe code which leads me to question if a safe function could be constructed to perform this action in a similar way to std::mem::replace.
You can std::mem::swap your field with a temp, and then replace it with your modified list like this. The big downside is the creation of the new LinkedList. I don't know how expensive that is.
pub fn remove_negative(&mut self) {
let mut temp = LinkedList::new();
std::mem::swap(&mut temp, &mut self.list);
self.list = temp.into_iter()
.filter(|&x| x > 0)
.collect();
}
If the goal is not clone you may use a reference-counting pointer: the clone method on Rc increments the reference counter.
use std::collections::LinkedList;
use std::rc::Rc;
#[derive(Debug)]
struct Example {
list: LinkedList<Rc<i8>>,
// ...
}
impl Example {
pub fn default() -> Example {
let mut list = LinkedList::new();
list.push_back(Rc::new(-5));
list.push_back(Rc::new(3));
list.push_back(Rc::new(-1));
list.push_back(Rc::new(6));
Example { list }
}
// Simmilar idea, but with creating a new list
pub fn get_positive(&self) -> LinkedList<Rc<i8>> {
self.list.iter()
.filter(|&x| x.as_ref() > &0)
.map(|x| x.clone())
.collect()
}
// Now, attempt to filter the elements without cloning anything
pub fn remove_negative(&mut self) {
self.list = self.list.iter()
.filter(|&x| x.as_ref() > &0)
.map(|x| x.clone())
.collect()
}
}
fn main() {
let mut e = Example::default();
e.remove_negative();
println!("{:?}", e.get_positive());
println!("{:?}", e);
}

Returning Error Enumeration with an Arbitrary Variable

I have a function in Rust using try! that attempts to collect all files in a directory recursively and insert them into a vector. Because the function uses try! to check errors, the compiler seems to expect an io::Result return from the function, and doesn't let me include the vector because the try! macro only returns a result. I need the vector to be returned.
Code is as follows:
mod os{
use std::io;
use std::fs::{self, DirEntry};
//use std::fs;
use std::path::Path;
// one possible implementation of walking a directory only visiting files
pub fn visit_dirs(dir: &Path, cb: &Fn(&DirEntry)) -> (io::Result<()>,Vec<String>) {
let mut filevec: Vec<String> = Vec::new();
if try!(fs::metadata(dir)).is_dir() {
for entry in try!(fs::read_dir(dir)) {
let entry = try!(entry);
if try!(fs::metadata(entry.path())).is_dir() {
try!(visit_dirs(&entry.path(), cb));
} else {
cb(&entry);
}
}
}
(Ok(()),filevec)
}
fn push_path_to_vec(p:&DirEntry,v:Vec<String>){
v.push(p.path().to_str().unwrap().to_string());
}}
Here is the error:
<std macros>:5:8: 6:42 error: mismatched types:
expected `(core::result::Result<(), std::io::error::Error>, collections::vec::Vec<collections::string::String>)`
found `core::result::Result<_, _>`
(expected tuple,
found enum `core::result::Result`) [E0308]
I wonder if there's any idiomatic way to do this that I've missed.
The return type of visit_dirs is wrong. The function should return a Result, but right now it returns a tuple. Since try! only works for functions returning a Result, your code doesn't compile. You can change the return value of visit_dirs in order to fix it:
pub fn visit_dirs(dir: &Path, cb: &Fn(&DirEntry)) -> io::Result<Vec<String>>
The new definition means that a Vec<String> will be stored in the Result upon success. With some minor tweaks, the code is accepted by the compiler (see below)
mod os{
use std::io;
use std::fs::{self, DirEntry};
//use std::fs;
use std::path::Path;
// one possible implementation of walking a directory only visiting files
pub fn visit_dirs(dir: &Path, cb: &Fn(&DirEntry)) -> io::Result<Vec<String>> {
let mut filevec: Vec<String> = Vec::new();
if try!(fs::metadata(dir)).is_dir() {
for entry in try!(fs::read_dir(dir)) {
let entry = try!(entry);
if try!(fs::metadata(entry.path())).is_dir() {
try!(visit_dirs(&entry.path(), cb));
} else {
cb(&entry);
}
}
}
Ok(filevec)
}
fn push_path_to_vec(p:&DirEntry,mut v:Vec<String>){
v.push(p.path().to_str().unwrap().to_string());
}}

Resources