Iterate over the sorted elements in a collection in tuples - collections

I am trying to iterate over the sorted elements in a collection in tuples of 2 or more.
If I had a Vec, I could call
for window in my_vec.windows(2) {
// do something with window
}
but Vecs aren't implicitly sorted, which would be really nice to have. I tried to use a BTreeSet instead of a Vec, but I don't seem to be able to call windows on it.
When trying to call
for window in tree_set.iter().windows(2) {
// do something with window
}
I get the error
no method named `windows` found for type `std::collections::btree_set::Iter<'_, Card>` in the current scope

Itertools provides the tuple_windows method:
extern crate itertools;
use itertools::Itertools;
use std::collections::BTreeSet;
fn main() {
let items: BTreeSet<_> = vec![1, 3, 2].into_iter().collect();
for (a, b) in items.iter().tuple_windows() {
println!("{} < {}", a, b);
}
}
Note that windows is a method on slices, not on iterators, and it returns an iterator of subslices of the original slice. A BTreeMap presumably cannot provide that same iterator interface because it isn't built on top of a contiguous hunk of data; there's going to be some value that isn't immediately next in memory to the subsequent value.

Related

How to iterate over a borrowed Option<Vec<_>> treating None as an empty iterator? [duplicate]

I am trying to iterate on on Option<Vec<>>.
#[derive(Debug)]
pub struct Person {
pub name: Option<String>,
pub age: Option<u64>,
}
#[derive(Debug)]
pub struct Foo {
pub people: Option<Vec<Person>>,
}
Naively I am using
for i in foo.people.iter() {
println!("{:?}", i);
}
Instead of iterating over all the elements of the Vec, I am actually displaying the whole Vec. It is like I am iterating over the only reference of the Option.
Using the following, I am iterating over the Vec content:
for i in foo.people.iter() {
for j in i.iter() {
println!("{:?}", j);
}
}
I am not sure this is the most pleasant syntax, I believe you should unwrap the Option first to actually iterate on the collection.
Then I don't see where you can actually use Option::iter, if you always have a single reference.
Here is the link to the playground.
As mentioned in comments to another answer, I would use the following:
// Either one works
foo.people.iter().flatten()
foo.people.iter().flat_map(identity)
The iter method on Option<T> will return an iterator of one or zero elements.
flatten takes each element (in this case &Vec<Person>) and flattens their nested elements.
This is the same as doing flat_map with identity, which takes each element (in this case &Vec<Person>) and flattens their nested elements.
Both paths result in an Iterator<Item = &Person>.
Option has an iter method that "iterates over the possibly contained value", i.e. provides either the single value in the Option (if option is Some), or no values at all (if the option is None). As such it is useful if you want to treat the option as a container where None means the container is empty and Some means it contains a single element.
To iterate over the underlying element's values, you need to switch from foo.people.iter() to either foo.people.unwrap().iter() or foo.people.unwrap_or_else(Vec::new).iter(), depending on whether you want the program to panic or to not iterate when encountering None people.
Compilable example in the playground.
Use Option::as_deref and Option::unwrap_or_default:
for i in foo.people.as_deref().unwrap_or_default() {
println!("{:?}", i);
}
Option::as_deref converts &Option<Vec<T>> into Option<&[T]>, then unwrap_or_default returns that &[T] or the default (an empty slice). You can then iterate on that directly.
See also:
Converting from Option<String> to Option<&str>
If you don't need an actual value with an IntoIterator implementation, you can just use an explicit if let instead:
if let Some(x) = foo.people {
for i in x {
// work with i here
}
}

How to convert a vector of vectors into a vector of slices without creating a new object? [duplicate]

I have the following:
enum SomeType {
VariantA(String),
VariantB(String, i32),
}
fn transform(x: SomeType) -> SomeType {
// very complicated transformation, reusing parts of x in order to produce result:
match x {
SomeType::VariantA(s) => SomeType::VariantB(s, 0),
SomeType::VariantB(s, i) => SomeType::VariantB(s, 2 * i),
}
}
fn main() {
let mut data = vec![
SomeType::VariantA("hello".to_string()),
SomeType::VariantA("bye".to_string()),
SomeType::VariantB("asdf".to_string(), 34),
];
}
I would now like to call transform on each element of data and store the resulting value back in data. I could do something like data.into_iter().map(transform).collect(), but this will allocate a new Vec. Is there a way to do this in-place, reusing the allocated memory of data? There once was Vec::map_in_place in Rust but it has been removed some time ago.
As a work-around, I've added a Dummy variant to SomeType and then do the following:
for x in &mut data {
let original = ::std::mem::replace(x, SomeType::Dummy);
*x = transform(original);
}
This does not feel right, and I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop. Is there a better way of doing this?
Your first problem is not map, it's transform.
transform takes ownership of its argument, while Vec has ownership of its arguments. Either one has to give, and poking a hole in the Vec would be a bad idea: what if transform panics?
The best fix, thus, is to change the signature of transform to:
fn transform(x: &mut SomeType) { ... }
then you can just do:
for x in &mut data { transform(x) }
Other solutions will be clunky, as they will need to deal with the fact that transform might panic.
No, it is not possible in general because the size of each element might change as the mapping is performed (fn transform(u8) -> u32).
Even when the sizes are the same, it's non-trivial.
In this case, you don't need to create a Dummy variant because creating an empty String is cheap; only 3 pointer-sized values and no heap allocation:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
let old = std::mem::replace(self, VariantA(String::new()));
// Note this line for the detailed explanation
*self = match old {
VariantA(s) => VariantB(s, 0),
VariantB(s, i) => VariantB(s, 2 * i),
};
}
}
for x in &mut data {
x.transform();
}
An alternate implementation that just replaces the String:
impl SomeType {
fn transform(&mut self) {
use SomeType::*;
*self = match self {
VariantA(s) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 0)
}
VariantB(s, i) => {
let s = std::mem::replace(s, String::new());
VariantB(s, 2 * *i)
}
};
}
}
In general, yes, you have to create some dummy value to do this generically and with safe code. Many times, you can wrap your whole element in Option and call Option::take to achieve the same effect .
See also:
Change enum variant while moving the field to the new variant
Why is it so complicated?
See this proposed and now-closed RFC for lots of related discussion. My understanding of that RFC (and the complexities behind it) is that there's an time period where your value would have an undefined value, which is not safe. If a panic were to happen at that exact second, then when your value is dropped, you might trigger undefined behavior, a bad thing.
If your code were to panic at the commented line, then the value of self is a concrete, known value. If it were some unknown value, dropping that string would try to drop that unknown value, and we are back in C. This is the purpose of the Dummy value - to always have a known-good value stored.
You even hinted at this (emphasis mine):
I have to deal with SomeType::Dummy everywhere else in the code, although it should never be visible outside of this loop
That "should" is the problem. During a panic, that dummy value is visible.
See also:
How can I swap in a new value for a field in a mutable reference to a structure?
Temporarily move out of borrowed content
How do I move out of a struct field that is an Option?
The now-removed implementation of Vec::map_in_place spans almost 175 lines of code, most of having to deal with unsafe code and reasoning why it is actually safe! Some crates have re-implemented this concept and attempted to make it safe; you can see an example in Sebastian Redl's answer.
You can write a map_in_place in terms of the take_mut or replace_with crates:
fn map_in_place<T, F>(v: &mut [T], f: F)
where
F: Fn(T) -> T,
{
for e in v {
take_mut::take(e, f);
}
}
However, if this panics in the supplied function, the program aborts completely; you cannot recover from the panic.
Alternatively, you could supply a placeholder element that sits in the empty spot while the inner function executes:
use std::mem;
fn map_in_place_with_placeholder<T, F>(v: &mut [T], f: F, mut placeholder: T)
where
F: Fn(T) -> T,
{
for e in v {
let mut tmp = mem::replace(e, placeholder);
tmp = f(tmp);
placeholder = mem::replace(e, tmp);
}
}
If this panics, the placeholder you supplied will sit in the panicked slot.
Finally, you could produce the placeholder on-demand; basically replace take_mut::take with take_mut::take_or_recover in the first version.

How do I mutate and optionally remove elements from a vec without memory allocation?

I have a Player struct that contains a vec of Effect instances. I want to iterate over this vec, decrease the remaining time for each Effect, and then remove any effects whose remaining time reaches zero. So far so good. However, for any effect removed, I also want to pass it to Player's undo_effect() method, before destroying the effect instance.
This is part of a game loop, so I want to do this without any additional memory allocation if possible.
I've tried using a simple for loop and also iterators, drain, retain, and filter, but I keep running into issues where self (the Player) would be mutably borrowed more than once, because modifying self.effects requires a mutable borrow, as does the undo_effect() method. The drain_filter() in nightly looks useful here but it was first proposed in 2017 so not holding my breath on that one.
One approach that did compile (see below), was to use two vectors and alternate between them on each frame. Elements are pop()'ed from vec 1 and either push()'ed to vec 2 or passed to undo_effect() as appropriate. On the next game loop iteration, the direction is reversed. Since each vec will not shrink, the only allocations will be if they grow larger than before.
I started abstracting this as its own struct but want to check if there is a better (or easier) way.
This one won't compile. The self.undo_effect() call would borrow self as mutable twice.
struct Player {
effects: Vec<Effect>
}
impl Player {
fn update(&mut self, delta_time: f32) {
for effect in &mut self.effects {
effect.remaining -= delta_time;
if effect.remaining <= 0.0 {
effect.active = false;
}
}
for effect in self.effects.iter_mut().filter(|e| !e.active) {
self.undo_effect(effect);
}
self.effects.retain(|e| e.active);
}
}
The below compiles ok - but is there a better way?
struct Player {
effects: [Vec<Effect>; 2],
index: usize
}
impl Player {
fn update(&mut self, delta_time: f32) {
let src_index = self.index;
let target_index = if self.index == 0 { 1 } else { 0 };
self.effects[target_index].clear(); // should be unnecessary.
while !self.effects[src_index].is_empty() {
if let Some(x) = self.effects[src_index].pop() {
if x.active {
self.effects[target_index].push(x);
} else {
self.undo_effect(&x);
}
}
}
self.index = target_index;
}
}
Is there an iterator version that works without unnecessary memory allocations? I'd be ok with allocating memory only for the removed elements, since this will be much rarer.
Would an iterator be more efficient than the pop()/push() version?
EDIT 2020-02-23:
I ended up coming back to this and I found a slightly more robust solution, similar to the above but without the danger of requiring a target_index field.
std::mem::swap(&mut self.effects, &mut self.effects_cache);
self.effects.clear();
while !self.effects_cache.is_empty() {
if let Some(x) = self.effects_cache.pop() {
if x.active {
self.effects.push(x);
} else {
self.undo_effect(&x);
}
}
}
Since self.effects_cache is unused outside this method and does not require self.effects_cache to have any particular value beforehand, the rest of the code can simply use self.effects and it will always be current.
The main issue is that you are borrowing a field (effects) of Player and trying to call undo_effect while this field is borrowed. As you noted, this does not work.
You already realized that you could juggle two vectors, but you could actually only juggle one (permanent) vector:
struct Player {
effects: Vec<Effect>
}
impl Player {
fn update(&mut self, delta_time: f32) {
for effect in &mut self.effects {
effect.remaining -= delta_time;
if effect.remaining <= 0.0 {
effect.active = false;
}
}
// Temporarily remove effects from Player.
let mut effects = std::mem::replace(&mut self.effects, vec!());
// Call Player::undo_effects (no outstanding borrows).
// `drain_filter` could also be used, for better efficiency.
for effect in effects.iter_mut().filter(|e| !e.active) {
self.undo_effect(effect);
}
// Restore effects
self.effects = effects;
self.effects.retain(|e| e.active);
}
}
This will not allocate because the default constructor of Vec does not allocate.
On the other hand, the double-vector solution might be more efficient as it allows a single pass over self.effects rather than two. YMMV.
If I understand you correctly, you have two questions:
How can I split a Vec into two Vecs (one which fulfill a predidate, the other one which doesn't)
Is it possible to do without memory overhead
There are multiple ways of splitting a Vec into two (or more).
You could use Iteratator::partition which will give you two distinct Iterators which can be used further.
There is the unstable Vec::drain_filter function which does the same but on a Vec itself
Use splitn (or splitn_mut) which will split your Vec/slice into n (2 in your case) Iterators
Depending on what you want to do, all solutions are applicable and good to use.
Is it possible without memory overhead? Not with the solutions above, because you need to create a second Vec which can hold the filtered items. But there is a solution, namely you can "sort" the Vec where the first half will contain all the items that fulfill the predicate (e.g. are not expired) and the second half that will fail the predicate (are expired). You just need to count the amount of items that fulfill the predicate.
Then you can use split_at (or split_at_mut) to split the Vec/slice into two distinct slices. Afterwards you can resize the Vec to the length of the good items and the other ones will be dropped.
The best answer is this one in C++.
[O]rder the indices vector, create two iterators into the data vector, one for reading and one for writing. Initialize the writing iterator to the first element to be removed, and the reading iterator to one beyond that one. Then in each step of the loop increment the iterators to the next value (writing) and next value not to be skipped (reading) and copy/move the elements. At the end of the loop call erase to discard the elements beyond the last written to position.
The Rust adaptation to your specific problem is to move the removed items out of the vector instead of just writing over them.
An alternative is to use a linked list instead of a vector to hold your Effect instances.

Mutable vectors in struct

I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want

Passing custom slice types by reference

I'm having trouble wrapping my head around how pointers, slices, and interfaces interact in Go. This is what I currently have coded up:
type Loader interface {
Load(string, string)
}
type Foo struct {
a, b string
}
type FooList []Foo
func (l FooList) Load(a, b string) {
l = append(l, Foo{a, b})
// l contains 1 Foo here
}
func Load(list Loader) {
list.Load("1", "2")
// list is still nil here
}
Given this setup, I then try to do the following:
var list FooList
Load(list)
fmt.Println(list)
However, list is always nil here. My FooList.Load function does add an element to the l slice, but that's as far as it gets. The list in Load continues to be nil. I think I should be able to just pass the reference to my slice around and have things append to it. I'm obviously missing something on how to get it to work though.
(Code in http://play.golang.org/p/uuRKjtxs9D)
If you intend your method to make changes, you probably want to use a pointer receiver.
// We also define a method Load on a FooList pointer receiver.
func (l *FooList) Load(a, b string) {
*l = append(*l, Foo{a, b})
}
This has a consequence, though, that a FooList value won't itself satisfy the Loader interface.
var list FooList
Load(list) // You should see a compiler error at this point.
A pointer to a FooList value, though, will satisfy the Loader interface.
var list FooList
Load(&list)
Complete code below:
package main
import "fmt"
/////////////////////////////
type Loader interface {
Load(string, string)
}
func Load(list Loader) {
list.Load("1", "2")
}
/////////////////////////////
type Foo struct {
a, b string
}
// We define a FooList to be a slice of Foo.
type FooList []Foo
// We also define a method Load on a FooList pointer receiver.
func (l *FooList) Load(a, b string) {
*l = append(*l, Foo{a, b})
}
// Given that we've defined the method with a pointer receiver, then a plain
// old FooList won't satisfy the Loader interface... but a FooList pointer will.
func main() {
var list FooList
Load(&list)
fmt.Println(list)
}
I'm going to simplify the problem so it's easier to understand. What is being done there is very similar to this, which also does not work (you can run it here):
type myInt int
func (a myInt) increment() { a = a + 1 }
func increment(b myInt) { b.increment() }
func main() {
var c myInt = 42
increment(c)
fmt.Println(c) // => 42
}
The reason why this does not work is because Go passes parameters by value, as the documentation describes:
In a function call, the function value and arguments are evaluated in the usual
order. After they are evaluated, the parameters of the call are passed by value
to the function and the called function begins execution.
In practice, this means that each of a, b, and c in the example above are pointing to different int variables, with a and b being copies of the initial c value.
To fix it, we must use pointers so that we can refer to the same area of memory (runnable here):
type myInt int
func (a *myInt) increment() { *a = *a + 1 }
func increment(b *myInt) { b.increment() }
func main() {
var c myInt = 42
increment(&c)
fmt.Println(c) // => 43
}
Now a and b are both pointers that contain the address of variable c, allowing their respective logic to change the original value. Note that the documented behavior still holds here: a and b are still copies of the original value, but the original value provided as a parameter to the increment function is the address of c.
The case for slices is no different than this. They are references, but the reference itself is provided as a parameter by value, so if you change the reference, the call site will not observe the change since they are different variables.
There's also a different way to make it work, though: implementing an API that resembles that of the standard append function. Again using the simpler example, we might implement increment without mutating the original value, and without using a pointer, by returning the changed value instead:
func increment(i int) int { return i+1 }
You can see that technique used in a number of places in the standard library, such as the strconv.AppendInt function.
It's worth keeping a mental model of how Go's data structures are implemented. That usually makes it easier to reason about behaviour like this.
http://research.swtch.com/godata is a good introduction to the high-level view.
Go is pass-by-value. This is true for both parameters and receivers. If you need to assign to the slice value, you need to use a pointer.
Then I read somewhere that you shouldn't pass pointers to slices since
they are already references
This is not entirely true, and is missing part of the story.
When we say something is a "reference type", including a map type, a channel type, etc., we mean that it is actually a pointer to an internal data structure. For example, you can think of a map type as basically defined as:
// pseudocode
type map *SomeInternalMapStructure
So to modify the "contents" of the associative array, you don't need to assign to a map variable; you can pass a map variable by value and that function can change the contents of the associative array pointed to by the map variable, and it will be visible to the caller. This makes sense when you realize it's a pointer to some internal data structure. You would only assign to a map variable if you want to change which internal associative array you want it to point to.
However, a slice is more complicated. It is a pointer (to an internal array), plus the length and capacity, two integers. So basically, you can think of it as:
// pseudocode
type slice struct {
underlyingArray uintptr
length int
capacity int
}
So it's not "just" a pointer. It is a pointer with respect to the underlying array. But the length and capacity are "value" parts of the slice type.
So if you just need to change an element of the slice, then yes, it acts like a reference type, in that you can pass the slice by value and have the function change an element and it's visible to the caller.
However, when you append() (which is what you're doing in the question), it's different. First, appending affects the length of the slice, and length is one of the direct parts of the slice, not behind a pointer. Second, appending may produce a different underlying array (if the capacity of the original underlying array is not enough, it allocates a new one); thus the array pointer part of the slice might also be changed. Thus it is necessary to change the slice value. (This is why append() returns something.) In this sense, it cannot be regarded as a reference type, because we are not just "changing what it points to"; we are changing the slice directly.

Resources