What is an alternative to Kotlin's `reduce` operation in Rust? - collections

I encountered this competitive programming problem:
nums is a vector of integers (length n)
ops is a vector of strings containing + and - (length n-1)
It can be solved with the reduce operation in Kotlin like this:
val op_iter = ops.iterator();
nums.reduce {a, b ->
when (op_iter.next()) {
"+" -> a+b
"-" -> a-b
else -> throw Exception()
}
}
reduce is described as:
Accumulates value starting with the first element and applying operation from left to right to current accumulator value and each element.
It looks like Rust vectors do not have a reduce method. How would you achieve this task?

Edited: since Rust version 1.51.0, this function is called reduce
Be aware of similar function which is called fold. The difference is that reduce will produce None if iterator is empty while fold accepts accumulator and will produce accumulator's value if iterator is empty.
Outdated answer is left to capture the history of this function debating how to name it:
There is no reduce in Rust 1.48. In many cases you can simulate it with fold but be aware that the semantics of the these functions are different. If the iterator is empty, fold will return the initial value whereas reduce returns None. If you want to perform multiplication operation on all elements, for example, getting result 1 for empty set is not too logical.
Rust does have a fold_first function which is equivalent to Kotlin's reduce, but it is not stable yet. The main discussion is about naming it. It is a safe bet to use it if you are ok with nightly Rust because there is little chance the function will be removed. In the worst case, the name will be changed. If you need stable Rust, then use fold if you are Ok with an illogical result for empty sets. If not, then you'll have to implement it, or find a crate such as reduce.

Kotlin's reduce takes the first item of the iterator for the starting point while Rust's fold and try_fold let you specify a custom starting point.
Here is an equivalent of the Kotlin code:
let mut it = nums.iter().cloned();
let start = it.next().unwrap();
it.zip(ops.iter()).try_fold(start, |a, (b, op)| match op {
'+' => Ok(a + b),
'-' => Ok(a - b),
_ => Err(()),
})
Playground
Or since we're starting from a vector, which can be indexed:
nums[1..]
.iter()
.zip(ops.iter())
.try_fold(nums[0], |a, (b, op)| match op {
'+' => Ok(a + b),
'-' => Ok(a - b),
_ => Err(()),
});
Playground

Related

How to invoke a multi-argument function without creating a closure?

I came across this while doing the 2018 Advent of Code (Day 2, Part 1) solution in Rust.
The problem to solve:
Take the count of strings that have exactly two of the same letter, multiplied by the count of strings that have exactly three of the same letter.
INPUT
abcdega
hihklmh
abqasbb
aaaabcd
The first string abcdega has a repeated twice.
The second string hihklmh has h repeated three times.
The third string abqasbb has a repeated twice, and b repeated three times, so it counts for both.
The fourth string aaaabcd contains a letter repeated 4 times (not 2, or 3) so it does not count.
So the result should be:
2 strings that contained a double letter (first and third) multiplied by 2 strings that contained a triple letter (second and third) = 4
The Question:
const PUZZLE_INPUT: &str =
"
abcdega
hihklmh
abqasbb
aaaabcd
";
fn letter_counts(id: &str) -> [u8;26] {
id.chars().map(|c| c as u8).fold([0;26], |mut counts, c| {
counts[usize::from(c - b'a')] += 1;
counts
})
}
fn has_repeated_letter(n: u8, letter_counts: &[u8;26]) -> bool {
letter_counts.iter().any(|&count| count == n)
}
fn main() {
let ids_iter = PUZZLE_INPUT.lines().map(letter_counts);
let num_ids_with_double = ids_iter.clone().filter(|id| has_repeated_letter(2, id)).count();
let num_ids_with_triple = ids_iter.filter(|id| has_repeated_letter(3, id)).count();
println!("{}", num_ids_with_double * num_ids_with_triple);
}
Rust Playground
Consider line 21. The function letter_counts takes only one argument, so I can use the syntax: .map(letter_counts) on elements that match the type of the expected argument. This is really nice to me! I love that I don't have to create a closure: .map(|id| letter_counts(id)). I find both to be readable, but the former version without the closure is much cleaner to me.
Now consider lines 22 and 23. Here, I have to use the syntax: .filter(|id| has_repeated_letter(3, id)) because the has_repeated_letter function takes two arguments. I would really like to do .filter(has_repeated_letter(3)) instead.
Sure, I could make the function take a tuple instead, map to a tuple and consume only a single argument... but that seems like a terrible solution. I'd rather just create the closure.
Leaving out the only argument is something that Rust lets you do. Why would it be any harder for the compiler to let you leave out the last argument, provided that it has all of the other n-1 arguments for a function that takes n arguments.
I feel like this would make the syntax a lot cleaner, and it would fit in a lot better with the idiomatic functional style that Rust prefers.
I am certainly no expert in compilers, but implementing this behavior seems like it would be straightforward. If my thinking is incorrect, I would love to know more about why that is so.
No, you cannot pass a function with multiple arguments as an implicit closure.
In certain cases, you can choose to use currying to reduce the arity of a function. For example, here we reduce the add function from 2 arguments to one:
fn add(a: i32, b: i32) -> i32 {
a + b
}
fn curry<A1, A2, R>(f: impl FnOnce(A1, A2) -> R, a1: A1) -> impl FnOnce(A2) -> R {
move |a2| f(a1, a2)
}
fn main() {
let a = Some(1);
a.map(curry(add, 2));
}
However, I agree with the comments that this isn't a benefit:
It's not any less typing:
a.map(curry(add, 2));
a.map(|v| add(v, 2));
The curry function is extremely limited: it chooses to use FnOnce, but Fn and FnMut also have use cases. It only applies to a function with two arguments.
However, I have used this higher-order function trick in other projects, where the amount of code that is added is much greater.

Recursive function that returns a Vec

I continue to struggle with the concept of recursion. I have a function that takes a u64 and returns a Vec<u64> of factors of that integer. I would like to recursively call this function on each item in the Vec, returning a flattened Vec until the function returns Vec<self> for each item, i.e., each item is prime.
fn prime_factors(x: u64) -> Vec<u64> {
let factors = factoring_method(x);
factors.iter().flat_map(|&i| factoring_method(i)).collect()
}
(The complete code)
This returns only the Vec of factors of the final iteration and also has no conditional that allows it to keep going until items are all prime.
The factoring_method is a congruence of squares that I'm pretty happy with. I'm certain there's lots of room for optimization, but I'm hoping to get a working version complete before refactoring. I think the recursion should in the congruence_of_squares — calling itself upon each member of the Vec it returns, but I'm not sure how to frame the conditional to keep it from doing so infinitely.
Useful recursion requires two things:
That a function call itself, either directly or indirectly.
That there be some terminating case.
One definition of the prime factorization of a number is:
if the number is prime, that is the only prime factor
otherwise, combine the prime factors of a pair of factors of the number
From that, we can identify a termination condition ("if it's prime") and the recursive call ("prime factors of factors").
Note that we haven't written any code yet — everything to this point is conceptual.
We can then transcribe the idea to Rust:
fn prime_factors(x: u64) -> Vec<u64> {
if is_prime(x) {
vec![x]
} else {
factors(x).into_iter().flat_map(prime_factors).collect()
}
}
Interesting pieces here:
We use into_iter to avoid the need to dereference the iterated value.
We can pass the function name directly as the closure because the types align.
Some (inefficient) helper functions round out the implementation:
fn is_prime(x: u64) -> bool {
!(2..x).any(|i| x % i == 0)
}
fn factors(x: u64) -> Vec<u64> {
match (2..x).filter(|i| x % i == 0).next() {
Some(v) => vec![v, x / v],
None => vec![],
}
}

Concatenate a vector of vectors of strings

I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.
The best I could do so far has been the following:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.
I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.
So let me try and trace the types of each expression:
- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore
vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.
I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
let c = b.collect();
print_type_of(&c);
let vals : Vec<&String> = c;
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
Clearly, U=Iter<String>... Please help me clear up this mess.
EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}
My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.
EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
let vals : Vec<&String> = b.collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.
We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.
To minimize time spent growing and reallocating the vector, calculate the space needed up front.
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let size = vecs.iter().fold(0, |a, b| a + b.len());
vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
acc.extend(v); acc
})
}
If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */
The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)
vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()

Extract nth element of a tuple

For a list, you can do pattern matching and iterate until the nth element, but for a tuple, how would you grab the nth element?
TL;DR; Stop trying to access directly the n-th element of a t-uple and use a record or an array as they allow random access.
You can grab the n-th element by unpacking the t-uple with value deconstruction, either by a let construct, a match construct or a function definition:
let ivuple = (5, 2, 1, 1)
let squared_sum_let =
let (a,b,c,d) = ivuple in
a*a + b*b + c*c + d*d
let squared_sum_match =
match ivuple with (a,b,c,d) -> a*a + b*b + c*c + d*d
let squared_sum_fun (a,b,c,d) =
a*a + b*b + c*c + d*d
The match-construct has here no virtue over the let-construct, it is just included for the sake of completeness.
Do not use t-uples, Don¹
There are only a few cases where using t-uples to represent a type is the right thing to do. Most of the times, we pick a t-uple because we are too lazy to define a type and we should interpret the problem of accessing the n-th field of a t-uple or iterating over the fields of a t-uple as a serious signal that it is time to switch to a proper type.
There are two natural replacements to t-uples: records and arrays.
When to use records
We can see a record as a t-uple whose entries are labelled; as such, they are definitely the most natural replacement to t-uples if we want to access them directly.
type ivuple = {
a: int;
b: int;
c: int;
d: int;
}
We then access directly the field a of a value x of type ivuple by writing x.a. Note that records are easily copied with modifications, as in let y = { x with d = 0 }. There is no natural way to iterate over the fields of a record, mostly because a record do not need to be homogeneous.
When to use arrays
A large² homogeneous collection of values is adequately represented by an array, which allows direct access, iterating and folding. A possible inconvenience is that the size of an array is not part of its type, but for arrays of fixed size, this is easily circumvented by introducing a private type — or even an abstract type. I described an example of this technique in my answer to the question “OCaml compiler check for vector lengths”.
Note on float boxing
When using floats in t-uples, in records containing only floats and in arrays, these are unboxed. We should therefore not notice any performance modification when changing from one type to the other in our numeric computations.
¹ See the TeXbook.
² Large starts near 4.
Since the length of OCaml tuples is part of the type and hence known (and fixed) at compile time, you get the n-th item by straightforward pattern matching on the tuple. For the same reason, the problem of extracting the n-th element of an "arbitrary-length tuple" cannot occur in practice - such a "tuple" cannot be expressed in OCaml's type system.
You might still not want to write out a pattern every time you need to project a tuple, and nothing prevents you from generating the functions get_1_1...get_i_j... that extract the i-th element from a j-tuple for any possible combination of i and j occuring in your code, e.g.
let get_1_1 (a) = a
let get_1_2 (a,_) = a
let get_2_2 (_,a) = a
let get_1_3 (a,_,_) = a
let get_2_3 (_,a,_) = a
...
Not necessarily pretty, but possible.
Note: Previously I had claimed that OCaml tuples can have at most length 255 and you can simply generate all possible tuple projections once and for all. As #Virgile pointed out in the comments, this is incorrect - tuples can be huge. This means that it is impractical to generate all possible tuple projection functions upfront, hence the restriction "occurring in your code" above.
It's not possible to write such a function in full generality in OCaml. One way to see this is to think about what type the function would have. There are two problems. First, each size of tuple is a different type. So you can't write a function that accesses elements of tuples of different sizes. The second problem is that different elements of a tuple can have different types. Lists don't have either of these problems, which is why you can have List.nth.
If you're willing to work with a fixed size tuple whose elements are all the same type, you can write a function as shown by #user2361830.
Update
If you really have collections of values of the same type that you want to access by index, you should probably be using an array.
here is a function wich return you the string of the ocaml function you need to do that ;) very helpful I use it frequently.
let tup len n =
if n>=0 && n<len then
let rec rep str nn = match nn<1 with
|true ->""
|_->str ^ (rep str (nn-1))in
let txt1 ="let t"^(string_of_int len)^"_"^(string_of_int n)^" tup = match tup with |" ^ (rep "_," n) ^ "a" and
txt2 =","^(rep "_," (len-n-2)) and
txt3 ="->a" in
if n = len-1 then
print_string (txt1^txt3)
else
print_string (txt1^txt2^"_"^txt3)
else raise (Failure "Error") ;;
For example:
tup 8 6;;
return:
let t8_6 tup = match tup with |_,_,_,_,_,_,a,_->a
and of course:
val t8_6 : 'a * 'b * 'c * 'd * 'e * 'f * 'g * 'h -> 'g = <fun>

Implementing quicksort in a functional language

I need to implement quicksort in SML for a homework assignment, and I'm lost. I was previously unfamiliar with how quicksort was implemented, so I read up on that, but every implementation I read about was in an imperative one. These don't look too difficult, but I had no idea how to implement quicksort functionally.
Wikipedia happens to have quicksort code in Standard ML (which is the language required for my assignment), but I don't understand how it's working.
Wikipedia code:
val filt = List.filter
fun quicksort << xs = let
fun qs [] = []
| qs [x] = [x]
| qs (p::xs) = let
val lessThanP = (fn x => << (x, p))
in
qs (filt lessThanP xs) # p :: (qs (filt (not o lessThanP) xs))
end
in
qs xs
end
In particular, I don't understand this line: qs (filt lessThanP xs) # p :: (qs (filt (not o lessThanP) xs)). filt will return a list of everything in xs less than p*, which is concatenated with p, which is cons-ed onto everything >= p.*
*assuming the << (x, p) function returns true when x < p. Of course it doesn't have to be that.
Actually, typing this out is helping me understand what's going on a bit. Anyways, I'm trying to compare that SML function to wiki's quicksort pseudocode, which follows.
function quicksort(array, 'left', 'right')
// If the list has 2 or more items
if 'left' < 'right'
// See "Choice of pivot" section below for possible choices
choose any 'pivotIndex' such that 'left' ≤ 'pivotIndex' ≤ 'right'
// Get lists of bigger and smaller items and final position of pivot
'pivotNewIndex' := partition(array, 'left', 'right', 'pivotIndex')
// Recursively sort elements smaller than the pivot
quicksort(array, 'left', 'pivotNewIndex' - 1)
// Recursively sort elements at least as big as the pivot
quicksort(array, 'pivotNewIndex' + 1, 'right')
Where partition is defined as
// left is the index of the leftmost element of the array
// right is the index of the rightmost element of the array (inclusive)
// number of elements in subarray = right-left+1
function partition(array, 'left', 'right', 'pivotIndex')
'pivotValue' := array['pivotIndex']
swap array['pivotIndex'] and array['right'] // Move pivot to end
'storeIndex' := 'left'
for 'i' from 'left' to 'right' - 1 // left ≤ i < right
if array['i'] < 'pivotValue'
swap array['i'] and array['storeIndex']
'storeIndex' := 'storeIndex' + 1
swap array['storeIndex'] and array['right'] // Move pivot to its final place
return 'storeIndex'
So, where exactly is the partitioning happening? Or am I thinking about SMLs quicksort wrongly?
So, where exactly is the partitioning happening? Or am I thinking about SMLs quicksort wrongly?
A purely functional implementation of quicksort works by structural recursion on the input list (IMO, this is worth mentioning). Moreover, as you see, the two calls to "filt" allow you to partition the input list into two sublists (say A and B), which then can be treated individually. What is important here is that:
all elements of A are less than or equal to the pivot element ("p" in the code)
all elements of B are greater than the pivot element
An imperative implementation works in-place, by swapping elements in the same array. In the pseudocode you've provided, the post-invariant of the "partition" function is that you have two subarrays, one starting at 'left' of input array (and ending at 'pivotIndex'), and another starting right after 'pivotIndex' and ending at 'right'. What is important here is that the two subarrays can be seen as representations of the sublists A and B.
I think that by now, you have the idea where the partitioning step is happening (or conversely, how the imperative and functional are related).
You said this:
filt will return a list of everything in xs less than p*, which is concatenated with p, which is cons-ed onto everything >= p.*
That's not quite accurate. filt will return a list of everything in xs less than p, but that new list isn't immediately concatenated with p. The new list is in fact passed to qs (recursively), and whatever qs returns is concatenated with p.
In the pseudocode version, the partitioning happens in-place in the array variable. That's why you see swap in the partition loop. Doing the partitioning in-place is much better for performance than making a copy.

Resources