Build HashSet from a vector in Rust - vector

I want to build a HashSet<u8> from a Vec<u8>. I'd like to do this
in one line of code,
copying the data only once,
using only 2n memory,
but the only thing I can get to compile is this piece of .. junk, which I think copies the data twice and uses 3n memory.
fn vec_to_set(vec: Vec<u8>) -> HashSet<u8> {
let mut victim = vec.clone();
let x: HashSet<u8> = victim.drain(..).collect();
return x;
}
I was hoping to write something simple, like this:
fn vec_to_set(vec: Vec<u8>) -> HashSet<u8> {
return HashSet::from_iter(vec.iter());
}
but that won't compile:
error[E0308]: mismatched types
--> <anon>:5:12
|
5 | return HashSet::from_iter(vec.iter());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected u8, found &u8
|
= note: expected type `std::collections::HashSet<u8>`
= note: found type `std::collections::HashSet<&u8, _>`
.. and I don't really understand the error message, probably because I need to RTFM.

Because the operation does not need to consume the vector¹, I think it should not consume it. That only leads to extra copying somewhere else in the program:
use std::collections::HashSet;
use std::iter::FromIterator;
fn hashset(data: &[u8]) -> HashSet<u8> {
HashSet::from_iter(data.iter().cloned())
}
Call it like hashset(&v) where v is a Vec<u8> or other thing that coerces to a slice.
There are of course more ways to write this, to be generic and all that, but this answer sticks to just introducing the thing I wanted to focus on.
¹This is based on that the element type u8 is Copy, i.e. it does not have ownership semantics.

The following should work nicely; it fulfills your requirements:
use std::collections::HashSet;
use std::iter::FromIterator;
fn vec_to_set(vec: Vec<u8>) -> HashSet<u8> {
HashSet::from_iter(vec)
}
from_iter() works on types implementing IntoIterator, so a Vec argument is sufficient.
Additional remarks:
you don't need to explicitly return function results; you only need to omit the semi-colon in the last expression in its body
I'm not sure which version of Rust you are using, but on current stable (1.12) to_iter() doesn't exist

Converting Vec to HashSet
Moving data ownership
let vec: Vec<u8> = vec![1, 2, 3, 4];
let hash_set: HashSet<u8> = vec.into_iter().collect();
Cloning data
let vec: Vec<u8> = vec![1, 2, 3, 4];
let hash_set: HashSet<u8> = vec.iter().cloned().collect();

Related

How to perform a `flat_map` (or similar operation) on an iterator N times without runtime polymorphism?

I want to be able to repeat a process where a collection that we are iterating over is altered an n number of times. n is only known at runtime, and can be specified by the user, so we cannot hard-code it into the type.
An approach that uses intermediate data structures by collect-ing between iterations is possible, like so:
let n = 10;
let mut vec1 = vec![1, 2, 3];
{
for _index in 0..n {
let temp_vec = vec1.into_iter().flat_map(|x| vec![x, x * 2]).collect();
vec1 = temp_vec;
}
}
However, this seems wasteful, because we are creating intermediate datastructures, so I went on looking for a solution that chains iterators directly.
At first I thought one could just do something like:
let mut iter = vec![1, 2, 3].into_iter();
for index in 0..n {
iter = iter.flat_map(|x| vec![x, x * 2].into_iter());
}
However, this does not work because in Rust, all functions on iterators return their own kind of 'compound iterator' struct. (In for instance Haskell, functions on iterators return the appropriate kind of result iterator, which does not become a 'bigger and bigger compound type'.)
Rewriting this as a recursive function had similar problems because (a) I was returning 'some kind of Iterator' whose type was (near?)-impossible to write out by hand because of the recursion, and (b) this type was different in the base case from the recursive case.
I found this question about conditionally returning either one or the other iterator type, as well as using impl Iterator to indicate that we return some concrete type that implements the Iterator trait, but we do not care about its exact nature.
A similar example to the code in the linked answer has been implemented in the code below as maybe_flatmap. This works.
However, I don't want to run flat_map zero or one time, but rather N times on the incoming iterator. Therefore, I adapted the code to call itself recursively up to a depth of N.
Attempting to do that, then makes the Rust compiler complain with an error[E0720]: opaque type expands to a recursive type:
use either::Either; // 1.5.3
/// Later we want to work with any appropriate items,
/// but for simplicity's sake, just use plain integers for now.
type I = u64;
/// Works, but limited to single level.
fn maybe_flatmap<T: Iterator<Item = I>>(iter: T, flag: bool) -> impl Iterator<Item = I> {
match flag {
false => Either::Left(iter),
true => Either::Right(iter.flat_map(move |x| vec![x, x * 2].into_iter())),
}
}
/// Does not work: opaque type expands to a recursive type!
fn rec_flatmap<T: Iterator<Item = I>>(iter: T, depth: usize) -> impl Iterator<Item = I> {
match depth {
0 => Either::Left(iter),
_ => {
let iter2 = iter.flat_map(move |x| vec![x, x * 2]).into_iter();
Either::Right(rec_flatmap(iter2, depth - 1))
}
}
}
fn main() {
let xs = vec![1, 2, 3, 4];
let xs2 = xs.into_iter();
let xs3 = maybe_flatmap(xs2, true);
let xs4: Vec<_> = xs3.collect();
println!("{:?}", xs4);
let ys = vec![1, 2, 3, 4];
let ys2 = ys.into_iter();
let ys3 = rec_flatmap(ys2, 5);
let ys4: Vec<_> = ys3.collect();
println!("{:?}", ys4);
}
Rust playground
error[E0720]: opaque type expands to a recursive type
--> src/main.rs:16:65
|
16 | fn rec_flatmap<T: Iterator<Item = I>>(iter: T, depth: usize) -> impl Iterator<Item = I> {
| ^^^^^^^^^^^^^^^^^^^^^^^ expands to a recursive type
|
= note: expanded type is `either::Either<T, impl std::iter::Iterator>`
I am stuck.
Since regardless of how often you flat_map, the final answer is going to be an (iterator over) a vector of integers, it seems like there ought to be a way of writing this function using only a single concrete return type.
Is this possible? Is there a way out of this situation without resorting to runtime polymorphism?
I believe/hope that a solution without dynamic polymorphism (trait objects or the like) is possible because regardless of how often you call flat_map the end result should have (at least morally) have the same type. I hope there is a way to shoehorn the (non-matching) nested FlatMap struct in a matching single static type somehow.
Is there a way to resolve this without runtime polymorphism?
No.
To solve it using a trait object:
let mut iter: Box<dyn Iterator<Item = i32>> = Box::new(vec![1, 2, 3].into_iter());
for _ in 0..n {
iter = Box::new(iter.flat_map(|x| vec![x, x * 2].into_iter()));
}
regardless of how often you call flat_map the end result should have (at least morally) have the same type
I don't know which morality to apply to type systems, but the literal size in memory is (very likely to be) different for FlatMap<...> and FlatMap<FlatMap<...>>. They are different types.
See also:
Conditionally iterate over one of several possible iterators
Creating Diesel.rs queries with a dynamic number of .and()'s
How do I iterate over a Vec of functions returning Futures in Rust?
How can I extend the lifetime of a temporary variable inside of an iterator adaptor in Rust?
Why does Iterator::take_while take ownership of the iterator?

What is the best way to repeat the elements in a vector in Rust?

I found this way, but it seems too verbose for such a common action:
fn double_vec(vec: Vec<i32>) -> Vec<i32> {
let mut vec1 = vec.clone();
let vec2 = vec.clone();
vec1.extend(vec2);
vec1
}
I know that in JavaScript it could be just arr2 = [...arr1, ...arr1].
"Doubling a vector" isn't something that's really done very often so there's no shortcut for it. In addition, it matters what is inside the Vec because that changes what operations can be performed on it. In this specific example, the following code works:
let x = vec![1, 2, 3];
let y: Vec<_> = x.iter().cycle().take(x.len() * 2).collect();
println!("{:?}", y); //[1, 2, 3, 1, 2, 3]
The cycle() method requires that the items in the Iterator implement the Clone trait so that the items can be duplicated. So if the items in your Vec implement Clone, then this will work. Since immutable references (&) implement Clone, a Vec<&Something> will work but mutable references (&mut) do not implement Clone and thus a Vec<&mut Something> will not work.
Note that even if a type does not implement Clone, you can still clone references to that type:
struct Test;
fn test_if_clone<T: Clone>(_x: T) {}
fn main() {
let x = Test;
test_if_clone(x); //error[E0277]: the trait bound `Test: std::clone::Clone` is not satisfied
let y = &x;
test_if_clone(y); //ok
}
You can use the concat method for this, it's simple:
fn double_vec(v: Vec<i32>) -> Vec<i32> {
[&v[..], &v[..]].concat()
}
Unfortunately we have to make the vectors slices explicitly (here &v[..]); but otherwise this method is good because it allocates the result to the needed size directly and then does the copies.
Building on Wesley's answer, you can also use chain to glue two iterables together, one after the other. In the below example I use the same Vec's iter() method twice:
let x = vec![1, 2, 3];
let y: Vec<_> = x.iter().chain(x.iter()).collect();
println!("{:?}", y); //[1, 2, 3, 1, 2, 3]
The iterator methods are a likely to be a lot less efficient than a straight memcpy that vector extension is.
You own code does a clone too many; you can just reuse the by-value input:
fn double_vec(mut vec: Vec<i32>) -> Vec<i32> {
let clone = vec.clone();
vec.extend(clone);
vec
}
However, the nature of a Vec means this is likely to require a copy even if you managed to remove that clone, so you're not generally gaining much over just using concat.
Using concat on slices is fairly efficient, as it will preallocate the Vec in advance and then perform an efficient extend_from_slice. However, this does mean it's no longer particularly sensible to take a Vec as input; writing the following is strictly more flexible.
fn double_slice(slice: &[i32]) -> Vec<i32> {
[slice, slice].concat()
}
Since Rust 1.53, Vec::extend_from_within makes it possible to more efficiently double a vector:
fn double_vec(vec: &mut Vec<i32>) {
vec.extend_from_within(..);
}

Print Vec using a placeholder [duplicate]

I tried the following code:
fn main() {
let v2 = vec![1; 10];
println!("{}", v2);
}
But the compiler complains:
error[E0277]: `std::vec::Vec<{integer}>` doesn't implement `std::fmt::Display`
--> src/main.rs:3:20
|
3 | println!("{}", v2);
| ^^ `std::vec::Vec<{integer}>` cannot be formatted with the default formatter
|
= help: the trait `std::fmt::Display` is not implemented for `std::vec::Vec<{integer}>`
= note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
= note: required by `std::fmt::Display::fmt`
Does anyone implement this trait for Vec<T>?
let v2 = vec![1; 10];
println!("{:?}", v2);
{} is for strings and other values which can be displayed directly to the user. There's no single way to show a vector to a user.
The {:?} formatter can be used to debug it, and it will look like:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Display is the trait that provides the method behind {}, and Debug is for {:?}
Does anyone implement this trait for Vec<T> ?
No.
And surprisingly, this is a demonstrably correct answer; which is rare since proving the absence of things is usually hard or impossible. So how can we be so certain?
Rust has very strict coherence rules, the impl Trait for Struct can only be done:
either in the same crate as Trait
or in the same crate as Struct
and nowhere else; let's try it:
impl<T> std::fmt::Display for Vec<T> {
fn fmt(&self, _: &mut std::fmt::Formatter) -> Result<(), std::fmt::Error> {
Ok(())
}
}
yields:
error[E0210]: type parameter `T` must be used as the type parameter for some local type (e.g., `MyStruct<T>`)
--> src/main.rs:1:1
|
1 | impl<T> std::fmt::Display for Vec<T> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type parameter `T` must be used as the type parameter for some local type
|
= note: only traits defined in the current crate can be implemented for a type parameter
Furthermore, to use a trait, it needs to be in scope (and therefore, you need to be linked to its crate), which means that:
you are linked both with the crate of Display and the crate of Vec
neither implement Display for Vec
and therefore leads us to conclude that no one implements Display for Vec.
As a work around, as indicated by Manishearth, you can use the Debug trait, which is invokable via "{:?}" as a format specifier.
If you know the type of the elements that the vector contains, you could make a struct that takes vector as an argument and implement Display for that struct.
use std::fmt::{Display, Formatter, Error};
struct NumVec(Vec<u32>);
impl Display for NumVec {
fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
let mut comma_separated = String::new();
for num in &self.0[0..self.0.len() - 1] {
comma_separated.push_str(&num.to_string());
comma_separated.push_str(", ");
}
comma_separated.push_str(&self.0[self.0.len() - 1].to_string());
write!(f, "{}", comma_separated)
}
}
fn main() {
let numbers = NumVec(vec![1; 10]);
println!("{}", numbers);
}
Here is a one-liner which should also work for you:
println!("[{}]", v2.iter().fold(String::new(), |acc, &num| acc + &num.to_string() + ", "));
Here is
a runnable example.
In my own case, I was receiving a Vec<&str> from a function call. I did not want to change the function signature to a custom type (for which I could implement the Display trait).
For my one-of case, I was able to turn the display of my Vec into a one-liner which I used with println!() directly as follows:
println!("{}", myStrVec.iter().fold(String::new(), |acc, &arg| acc + arg));
(The lambda can be adapted for use with different data types, or for more concise Display trait implementations.)
Starting with Rust 1.58, there is a slightly more concise way to print a vector (or any other variable). This lets you put the variable you want to print inside the curly braces, instead of needing to put it at the end. For the debug formatting needed to print a vector, you add :? in the braces, like this:
fn main() {
let v2 = vec![1; 10];
println!("{v2:?}");
}
Sometimes you don't want to use something like the accepted answer
let v2 = vec![1; 10];
println!("{:?}", v2);
because you want each element to be displayed using its Display trait, not its Debug trait; however, as noted, you can't implement Display on Vec because of Rust's coherence rules. Instead of implementing a wrapper struct with the Display trait, you can implement a more general solution with a function like this:
use std::fmt;
pub fn iterable_to_str<I, D>(iterable: I) -> String
where
I: IntoIterator<Item = D>,
D: fmt::Display,
{
let mut iterator = iterable.into_iter();
let head = match iterator.next() {
None => return String::from("[]"),
Some(x) => format!("[{}", x),
};
let body = iterator.fold(head, |a, v| format!("{}, {}", a, v));
format!("{}]", body)
}
which doesn't require wrapping your vector in a struct. As long as it implements IntoIterator and the element type implements Display, you can then call:
println!("{}", iterable_to_str(it));
Is there any reason not to write the vector's content item by item w/o former collecting? *)
use std::fmt::{Display, Formatter, Error};
struct NumVec(Vec<u32>);
impl Display for NumVec {
fn fmt(&self, f: &mut Formatter) -> Result<(), Error> {
let v = &self.0;
if v.len() == 0 {
return Ok(());
}
for num in &v[0..v.len() - 1] {
if let Err(e) = write!(f, "{}, ", &num.to_string()) {
return Err(e);
}
}
write!(f, "{}", &v[v.len() - 1])
}
}
fn main() {
let numbers = NumVec(vec![1; 10]);
println!("{}", numbers);
}
*) No there isn't.
Because we want to display something, the Display trait is implemented for sure. So this is correct Rust because: the Doc says about the ToString trait:
"This trait is automatically implemented for any type which implements the Display trait. As such, ToString shouldn’t be implemented directly: Display should be implemented instead, and you get the ToString implementation for free."
In particular on microcontrollers where space is limited I definitely would go with this solution and write immediately.

Concatenate a vector of vectors of strings

I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.
The best I could do so far has been the following:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.
I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.
So let me try and trace the types of each expression:
- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore
vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.
I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
let c = b.collect();
print_type_of(&c);
let vals : Vec<&String> = c;
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
Clearly, U=Iter<String>... Please help me clear up this mess.
EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}
My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.
EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
let vals : Vec<&String> = b.collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.
We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.
To minimize time spent growing and reallocating the vector, calculate the space needed up front.
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let size = vecs.iter().fold(0, |a, b| a + b.len());
vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
acc.extend(v); acc
})
}
If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */
The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)
vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()

Reference to a vector still prints as a vector?

Silly n00b trying to learn a bit about Rust. Here is my program:
fn main() {
let v = vec![1, 2, 3];
println!("{:?}", v);
println!("{:?}", &v);
}
Produced the output:
[1, 2, 3]
[1, 2, 3]
What is the point of the &? I was half expecting it to print a memory address.
I was originally thrown by this in the intro where it looks like they are looping through a reference. My guess is that Rust does some magic and detects it is a memory address of a vector?
What is the point of the &?
The & takes the reference of an object, as you surmised. However, there's a Debug implementation for references to Debug types that just prints out the referred-to object. This is done because Rust tends to prefer value equality over reference equality:
impl<'a, T: ?Sized + $tr> $tr for &'a T {
fn fmt(&self, f: &mut Formatter) -> Result { $tr::fmt(&**self, f) }
}
If you'd like to print the memory address, you can use {:p}:
let v = vec![1,2,3];
println!("{:p}", &v);
it looks like they are looping through a reference
The for i in foo syntax sugar calls into_iterator on foo, and there's an implementation of IntoIterator for &Vec that returns an iterator of references to items in the iterator:
fn into_iter(self) -> slice::Iter<'a, T> {
self.iter()
}
The magic is AFAIK in the formatter rather than the compiler. See for example:
fn take_val<T>(a:Vec<T> ) {}
fn take_ref<T>(b:&Vec<T>) {}
fn main() {
let v = vec![1, 2, 3];
take_val(&v);
take_ref(&v);
}
Fails with following error:
<anon>:6:14: 6:16 error: mismatched types:
expected `collections::vec::Vec<_>`,
found `&collections::vec::Vec<_>`
(expected struct `collections::vec::Vec`,
found &-ptr) [E0308]
<anon>:6 take_val(&v);
Which suggests this is due to formatter not wanting to show difference between a reference and a value. In older versions of Rust a &v would have been shown as &[1, 2, 3], if my memory serves me correct.
& has special meaning in Rust. It's not just a reference, it's a note that the value is borrowed to one or more functions/methods.

Resources