Concatenate a vector of vectors of strings - vector

I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.
The best I could do so far has been the following:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.
I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.
So let me try and trace the types of each expression:
- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore
vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.
I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
let c = b.collect();
print_type_of(&c);
let vals : Vec<&String> = c;
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
Clearly, U=Iter<String>... Please help me clear up this mess.
EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}
My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.
EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
let vals : Vec<&String> = b.collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}

I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.
We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.
To minimize time spent growing and reallocating the vector, calculate the space needed up front.
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let size = vecs.iter().fold(0, |a, b| a + b.len());
vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
acc.extend(v); acc
})
}
If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */
The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)
vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()

Related

How to convert a vector of vectors into a vector of slices without creating a new object? [duplicate]

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

How to extract elements of a vector of strings in Rust?

Let's say I have the following code:
fn extract() -> Vec<String> {
let data = vec!["aaa".to_string(), "bbb".to_string(), "ccc".to_string()];
vec![data[0], data[2]]
}
In practice, I read data from a file.
Obviously, this doesn't compile because I'm pulling strings out of the vector data, leaving the vector in an undefined state. But, conceptually, it should work, because I'm not using data afterwards anyway.
I can use mem::replace, but this seems crazy:
fn extract() -> Vec<String> {
let mut data = vec!["aaa".to_string(), "bbb".to_string(), "ccc".to_string()];
let a = mem::replace(&mut data[0], "".to_string());
let c = mem::replace(&mut data[2], "".to_string());
vec![a, c]
}
How do I go about extracting specific elements from the vector without having to clone the strings?
Vec has special methods for that. swap_remove, remove (warning, linear complexity), drain. For example,
fn extract() -> Vec<String> {
let mut data = vec!["aaa".to_string(), "bbb".to_string(), "ccc".to_string()];
// order does matter
vec![data.swap_remove(2), data.swap_remove(0)]
}
You cannot have "holes" in a vector. So when you move something out of it, you either change the indices of the remaining elements (using remove or swap_remove), or replace it with something else.
Why don't you just consume the vector in order and ignore what you don't need? If you need to save some of the elements for later use, you can use data.iter().filter(...).collect(). If you really want to avoid copying any strings, you can wrap them in Rc so that only pointers are copied.

How to invoke a multi-argument function without creating a closure?

I came across this while doing the 2018 Advent of Code (Day 2, Part 1) solution in Rust.
The problem to solve:
Take the count of strings that have exactly two of the same letter, multiplied by the count of strings that have exactly three of the same letter.
INPUT
abcdega
hihklmh
abqasbb
aaaabcd
The first string abcdega has a repeated twice.
The second string hihklmh has h repeated three times.
The third string abqasbb has a repeated twice, and b repeated three times, so it counts for both.
The fourth string aaaabcd contains a letter repeated 4 times (not 2, or 3) so it does not count.
So the result should be:
2 strings that contained a double letter (first and third) multiplied by 2 strings that contained a triple letter (second and third) = 4
The Question:
const PUZZLE_INPUT: &str =
"
abcdega
hihklmh
abqasbb
aaaabcd
";
fn letter_counts(id: &str) -> [u8;26] {
id.chars().map(|c| c as u8).fold([0;26], |mut counts, c| {
counts[usize::from(c - b'a')] += 1;
counts
})
}
fn has_repeated_letter(n: u8, letter_counts: &[u8;26]) -> bool {
letter_counts.iter().any(|&count| count == n)
}
fn main() {
let ids_iter = PUZZLE_INPUT.lines().map(letter_counts);
let num_ids_with_double = ids_iter.clone().filter(|id| has_repeated_letter(2, id)).count();
let num_ids_with_triple = ids_iter.filter(|id| has_repeated_letter(3, id)).count();
println!("{}", num_ids_with_double * num_ids_with_triple);
}
Rust Playground
Consider line 21. The function letter_counts takes only one argument, so I can use the syntax: .map(letter_counts) on elements that match the type of the expected argument. This is really nice to me! I love that I don't have to create a closure: .map(|id| letter_counts(id)). I find both to be readable, but the former version without the closure is much cleaner to me.
Now consider lines 22 and 23. Here, I have to use the syntax: .filter(|id| has_repeated_letter(3, id)) because the has_repeated_letter function takes two arguments. I would really like to do .filter(has_repeated_letter(3)) instead.
Sure, I could make the function take a tuple instead, map to a tuple and consume only a single argument... but that seems like a terrible solution. I'd rather just create the closure.
Leaving out the only argument is something that Rust lets you do. Why would it be any harder for the compiler to let you leave out the last argument, provided that it has all of the other n-1 arguments for a function that takes n arguments.
I feel like this would make the syntax a lot cleaner, and it would fit in a lot better with the idiomatic functional style that Rust prefers.
I am certainly no expert in compilers, but implementing this behavior seems like it would be straightforward. If my thinking is incorrect, I would love to know more about why that is so.
No, you cannot pass a function with multiple arguments as an implicit closure.
In certain cases, you can choose to use currying to reduce the arity of a function. For example, here we reduce the add function from 2 arguments to one:
fn add(a: i32, b: i32) -> i32 {
a + b
}
fn curry<A1, A2, R>(f: impl FnOnce(A1, A2) -> R, a1: A1) -> impl FnOnce(A2) -> R {
move |a2| f(a1, a2)
}
fn main() {
let a = Some(1);
a.map(curry(add, 2));
}
However, I agree with the comments that this isn't a benefit:
It's not any less typing:
a.map(curry(add, 2));
a.map(|v| add(v, 2));
The curry function is extremely limited: it chooses to use FnOnce, but Fn and FnMut also have use cases. It only applies to a function with two arguments.
However, I have used this higher-order function trick in other projects, where the amount of code that is added is much greater.

How can I search through a 2D array without for loops?

I have a Vec<Vec<char>> and I want to find all the x,y positions of a specific character, let's say 'x'. I can use a double for loop with enumerate and manually build up the solution (and I would guess this is the sane thing to do), but is there a nice way to do it with nothing but iterators?
More or less I'm looking for ways to clean this up:
let locs: Vec<(usize, (usize, &char))> = grid.iter()
.enumerate()
.flat_map(|(ind, row)|
iter::repeat(ind)
.zip(row.iter()
.enumerate()))
.filter(|&(x, (y, ch))| ch == 'x')
.collect();
For one, is there a way to flatten the tuples?
Here's my attempt, which does flatten the tuples:
let locs: Vec<(usize, usize, char)> = grid.iter()
.enumerate()
.flat_map(|(y, row)| {
row.iter()
.enumerate()
.map(move |(x, &c)| (x,y,c))
})
.filter(|&(_,_,c)| c == 'x')
.collect();
println!("{:?}", locs)
Playground
My approach was to first flatten to (x,y,c) and then filter. I took the liberty of returning the actual chars rather than references.
The move closure was needed because otherwise the inner closure (which lives longer, inside the iterator, than the outer closure) had a reference to the outer y.
If I wanted to do this more often, I would write an Iterator implementation which let me do:
let locs: Vec<(usize, usize, char) =
iter2d(grid)
.filter(&|_,_,c| c == 'x')
.collect();
The implementation is left as an exercise for the reader. :-)

Build HashSet from a vector in Rust

I want to build a HashSet<u8> from a Vec<u8>. I'd like to do this
in one line of code,
copying the data only once,
using only 2n memory,
but the only thing I can get to compile is this piece of .. junk, which I think copies the data twice and uses 3n memory.
fn vec_to_set(vec: Vec<u8>) -> HashSet<u8> {
let mut victim = vec.clone();
let x: HashSet<u8> = victim.drain(..).collect();
return x;
}
I was hoping to write something simple, like this:
fn vec_to_set(vec: Vec<u8>) -> HashSet<u8> {
return HashSet::from_iter(vec.iter());
}
but that won't compile:
error[E0308]: mismatched types
--> <anon>:5:12
|
5 | return HashSet::from_iter(vec.iter());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected u8, found &u8
|
= note: expected type `std::collections::HashSet<u8>`
= note: found type `std::collections::HashSet<&u8, _>`
.. and I don't really understand the error message, probably because I need to RTFM.
Because the operation does not need to consume the vector¹, I think it should not consume it. That only leads to extra copying somewhere else in the program:
use std::collections::HashSet;
use std::iter::FromIterator;
fn hashset(data: &[u8]) -> HashSet<u8> {
HashSet::from_iter(data.iter().cloned())
}
Call it like hashset(&v) where v is a Vec<u8> or other thing that coerces to a slice.
There are of course more ways to write this, to be generic and all that, but this answer sticks to just introducing the thing I wanted to focus on.
¹This is based on that the element type u8 is Copy, i.e. it does not have ownership semantics.
The following should work nicely; it fulfills your requirements:
use std::collections::HashSet;
use std::iter::FromIterator;
fn vec_to_set(vec: Vec<u8>) -> HashSet<u8> {
HashSet::from_iter(vec)
}
from_iter() works on types implementing IntoIterator, so a Vec argument is sufficient.
Additional remarks:
you don't need to explicitly return function results; you only need to omit the semi-colon in the last expression in its body
I'm not sure which version of Rust you are using, but on current stable (1.12) to_iter() doesn't exist
Converting Vec to HashSet
Moving data ownership
let vec: Vec<u8> = vec![1, 2, 3, 4];
let hash_set: HashSet<u8> = vec.into_iter().collect();
Cloning data
let vec: Vec<u8> = vec![1, 2, 3, 4];
let hash_set: HashSet<u8> = vec.iter().cloned().collect();

Resources