Extending a vector asynchronously without temporary initialisation - asynchronous

Assume I have an async function f(usize) -> Future<Output = T> and a vector v of length n_old such that v[i] = f(i).await. I would now like to extend this vector to length n_new while maintaining the v[i] = f(i).await invariant, and I would like to exploit the asynchronous nature of f while doing so. What's the best way to do this?
If T implements Default, then one way to do what I want is the following:
v.resize_with(n_new, Default::default);
futures::stream::iter((n_old..n_new).zip(v[n_old..n_new].iter_mut()))
.for_each_concurrent(None, |(i, vi)| async move {
*vi = f(i).await;
})
.await;
However, this feels a bit overly convoluted. Is there a more elegant approach?
Complete MWE: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e050df38de58d6376b103bb4d25afb6e

Related

How to convert a vector of vectors into a vector of slices without creating a new object? [duplicate]

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

How to invoke a multi-argument function without creating a closure?

I came across this while doing the 2018 Advent of Code (Day 2, Part 1) solution in Rust.
The problem to solve:
Take the count of strings that have exactly two of the same letter, multiplied by the count of strings that have exactly three of the same letter.
INPUT
abcdega
hihklmh
abqasbb
aaaabcd
The first string abcdega has a repeated twice.
The second string hihklmh has h repeated three times.
The third string abqasbb has a repeated twice, and b repeated three times, so it counts for both.
The fourth string aaaabcd contains a letter repeated 4 times (not 2, or 3) so it does not count.
So the result should be:
2 strings that contained a double letter (first and third) multiplied by 2 strings that contained a triple letter (second and third) = 4
The Question:
const PUZZLE_INPUT: &str =
"
abcdega
hihklmh
abqasbb
aaaabcd
";
fn letter_counts(id: &str) -> [u8;26] {
id.chars().map(|c| c as u8).fold([0;26], |mut counts, c| {
counts[usize::from(c - b'a')] += 1;
counts
})
}
fn has_repeated_letter(n: u8, letter_counts: &[u8;26]) -> bool {
letter_counts.iter().any(|&count| count == n)
}
fn main() {
let ids_iter = PUZZLE_INPUT.lines().map(letter_counts);
let num_ids_with_double = ids_iter.clone().filter(|id| has_repeated_letter(2, id)).count();
let num_ids_with_triple = ids_iter.filter(|id| has_repeated_letter(3, id)).count();
println!("{}", num_ids_with_double * num_ids_with_triple);
}
Rust Playground
Consider line 21. The function letter_counts takes only one argument, so I can use the syntax: .map(letter_counts) on elements that match the type of the expected argument. This is really nice to me! I love that I don't have to create a closure: .map(|id| letter_counts(id)). I find both to be readable, but the former version without the closure is much cleaner to me.
Now consider lines 22 and 23. Here, I have to use the syntax: .filter(|id| has_repeated_letter(3, id)) because the has_repeated_letter function takes two arguments. I would really like to do .filter(has_repeated_letter(3)) instead.
Sure, I could make the function take a tuple instead, map to a tuple and consume only a single argument... but that seems like a terrible solution. I'd rather just create the closure.
Leaving out the only argument is something that Rust lets you do. Why would it be any harder for the compiler to let you leave out the last argument, provided that it has all of the other n-1 arguments for a function that takes n arguments.
I feel like this would make the syntax a lot cleaner, and it would fit in a lot better with the idiomatic functional style that Rust prefers.
I am certainly no expert in compilers, but implementing this behavior seems like it would be straightforward. If my thinking is incorrect, I would love to know more about why that is so.
No, you cannot pass a function with multiple arguments as an implicit closure.
In certain cases, you can choose to use currying to reduce the arity of a function. For example, here we reduce the add function from 2 arguments to one:
fn add(a: i32, b: i32) -> i32 {
a + b
}
fn curry<A1, A2, R>(f: impl FnOnce(A1, A2) -> R, a1: A1) -> impl FnOnce(A2) -> R {
move |a2| f(a1, a2)
}
fn main() {
let a = Some(1);
a.map(curry(add, 2));
}
However, I agree with the comments that this isn't a benefit:
It's not any less typing:
a.map(curry(add, 2));
a.map(|v| add(v, 2));
The curry function is extremely limited: it chooses to use FnOnce, but Fn and FnMut also have use cases. It only applies to a function with two arguments.
However, I have used this higher-order function trick in other projects, where the amount of code that is added is much greater.

Recursive function that returns a Vec

I continue to struggle with the concept of recursion. I have a function that takes a u64 and returns a Vec<u64> of factors of that integer. I would like to recursively call this function on each item in the Vec, returning a flattened Vec until the function returns Vec<self> for each item, i.e., each item is prime.
fn prime_factors(x: u64) -> Vec<u64> {
let factors = factoring_method(x);
factors.iter().flat_map(|&i| factoring_method(i)).collect()
}
(The complete code)
This returns only the Vec of factors of the final iteration and also has no conditional that allows it to keep going until items are all prime.
The factoring_method is a congruence of squares that I'm pretty happy with. I'm certain there's lots of room for optimization, but I'm hoping to get a working version complete before refactoring. I think the recursion should in the congruence_of_squares — calling itself upon each member of the Vec it returns, but I'm not sure how to frame the conditional to keep it from doing so infinitely.
Useful recursion requires two things:
That a function call itself, either directly or indirectly.
That there be some terminating case.
One definition of the prime factorization of a number is:
if the number is prime, that is the only prime factor
otherwise, combine the prime factors of a pair of factors of the number
From that, we can identify a termination condition ("if it's prime") and the recursive call ("prime factors of factors").
Note that we haven't written any code yet — everything to this point is conceptual.
We can then transcribe the idea to Rust:
fn prime_factors(x: u64) -> Vec<u64> {
if is_prime(x) {
vec![x]
} else {
factors(x).into_iter().flat_map(prime_factors).collect()
}
}
Interesting pieces here:
We use into_iter to avoid the need to dereference the iterated value.
We can pass the function name directly as the closure because the types align.
Some (inefficient) helper functions round out the implementation:
fn is_prime(x: u64) -> bool {
!(2..x).any(|i| x % i == 0)
}
fn factors(x: u64) -> Vec<u64> {
match (2..x).filter(|i| x % i == 0).next() {
Some(v) => vec![v, x / v],
None => vec![],
}
}

Concatenate a vector of vectors of strings

I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.
The best I could do so far has been the following:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.
I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.
So let me try and trace the types of each expression:
- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore
vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.
I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
let c = b.collect();
print_type_of(&c);
let vals : Vec<&String> = c;
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
Clearly, U=Iter<String>... Please help me clear up this mess.
EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}
My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.
EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
let vals : Vec<&String> = b.collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.
We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.
To minimize time spent growing and reallocating the vector, calculate the space needed up front.
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let size = vecs.iter().fold(0, |a, b| a + b.len());
vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
acc.extend(v); acc
})
}
If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */
The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)
vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()

Is there a way of providing a final transform method when chaining operations (like map reduce) in underscore.js?

(Really strugging to title this question, so if anyone has suggestions feel free.)
Say I wanted to do an operation like:
take an array [1,2,3]
multiply each element by 2 (map): [2,4,6]
add the elements together (reduce): 12
multiply the result by 10: 120
I can do this pretty cleanly in underscore using chaining, like so:
arr = [1,2,3]
map = (el) -> 2*el
reduce = (s,n) -> s+n
out = (r) -> 10*r
reduced = _.chain(arr).map(map).reduce(reduce).value()
result = out(reduced)
However, it would be even nicer if I could chain the 'out' method too, like this:
result = _.chain(arr).map(map).reduce(reduce).out(out).value()
Now this would be a fairly simple addition to a library like underscore. But my questions are:
Does this 'out' method have a name in functional programming?
Does this already exist in underscore (tap comes close, but not quite).
This question got me quite hooked. Here are some of my thoughts.
It feels like using underscore.js in 'chain() mode' breaks away from functional programming paradigm. Basically, instead of calling functions on functions, you're calling methods of an instance of a wrapper object in an OOP way.
I am using underscore's chain() myself here and there, but this question made me think. What if it's better to simply create more meaningful functions that can then be called in a sequence without having to use chain() at all. Your example would then look something like this:
arr = [1,2,3]
double = (arr) -> _.map(arr, (el) -> 2 * el)
sum = (arr) -> _.reduce(arr, (s, n) -> s + n)
out = (r) -> 10 * r
result = out sum double arr
# probably a less ambiguous way to do it would be
result = out(sum(double arr))
Looking at real functional programming languages (as in .. much more functional than JavaScript), it seems you could do exactly the same thing there in an even simpler manner. Here is the same program written in Standard ML. Notice how calling map with only one argument returns another function. There is no need to wrap this map in another function like we did in JavaScript.
val arr = [1,2,3];
val double = map (fn x => 2*x);
val sum = foldl (fn (a,b) => a+b) 0;
val out = fn r => 10*r;
val result = out(sum(double arr))
Standard ML also lets you create operators which means we can make a little 'chain' operator that can be used to call those functions in a more intuitive order.
infix 1 |>;
fun x |> f = f x;
val result = arr |> double |> sum |> out
I also think that this underscore.js chaining has something similar to monads in functional programming, but I don't know much about those. Though, I have feeling that this kind of data manipulation pipeline is not something you would typically use monads for.
I hope someone with more functional programming experience can chip in and correct me if I'm wrong on any of the points above.
UPDATE
Getting slightly off topic, but one way to creating partial functions could be the following:
// extend underscore with partialr function
_.mixin({
partialr: function (fn, context) {
var args = Array.prototype.slice.call(arguments, 2);
return function () {
return fn.apply(context, Array.prototype.slice.call(arguments).concat(args));
};
}
});
This function can now be used to create a partial function from any underscore function, because most of them take the input data as the first argument. For example, the sum function can now be created like
var sum = _.partialr(_.reduce, this, function (s, n) { return s + n; });
sum([1,2,3]);
I still prefer arr |> double |> sum |> out over out(sum(double(arr))) though. Underscore's chain() is nice in that it reads in a more natural order.
In terms of the name you are looking for, I think what you are trying to do is just a form of function application: you have an underscore object and you want to apply a function to its value. In underscore, you can define it like this:
_.mixin({
app: function(v, f) { return f (v); }
});
then you can pretty much do what you asked for:
var arr = [1,2,3];
function m(el) { return 2*el; };
function r(s,n) { return s+n; };
function out(r) { return 10*r; };
console.log("result: " + _.chain(arr).map(m).reduce(r).app(out).value()));
Having said all that, I think using traditional typed functional languages like SML make this kind of think a lot slicker and give much lighter weight syntax for function composition. Underscore is doing a kind of jquery twist on functional programming that I'm not sure what I think of; but without static-type checking it is frustratingly easy to make errors!

Resources