I'm looking for a data structure that can store a list of elements, while also enabling sub-O(n) lookup of the index given an element and element given an index, as well as insertion at an index.
Elements are dense (integers 0..n) and unique, but unsorted.
For example, in Rust this data structure would be used like so:
fn main() {
let mut list = List::new();
list.extend(vec![5, 2, 0, 4, 1, 3]);
assert_eq!(list.get(2), 0);
assert_eq!(list.get(3), 4);
assert_eq!(list.index_of(0), 2);
assert_eq!(list.index_of(4), 3);
}
O(√n) operations would be acceptable, O(log n) would be ideal. I'm drawing a blank here; any help much appreciated!
This library provides an "IndexedTreeListSet" data structure which implements the three operations required in O(log n):
lookup index -> element
lookup element -> index (aka index of)
insert element at index
It does this, as Mo B. notes, with an ancilliary hashmap to map elements to their node in the tree, which is then traversed upwards to the root. As each node in the tree contains its relative index, at the root the absolute index can be calculated.
I switched from a naïve approach (with O(n) inserts) to this, and the reduction of wall execution time for inserts (which occur ~100 times per second) from ~100ms to ~1ms.
Would maintaining a separate Vec<T> and HashMap<usize, T> be adequate? The HashMap would have a lower lookup time than O(log n) with it being O(1)~.
Downsides of this seem to be that:
You would have to store both the Vec<T> and HashMap<usize, T> in memory.
Upon removal of an element, you will have to decrement the indices for each element in the HashMap<usize, T> as well, possibly making removal costly.
use std::collections::HashMap;
struct List {
index_to_value: Vec<i32>,
value_to_index: HashMap<i32, usize>,
}
impl List {
fn new<I>(index_to_value: I) -> Self
where
I: Into<Vec<i32>>,
{
let index_to_value = index_to_value.into();
let value_to_index = index_to_value
.iter()
.copied()
.enumerate()
.map(|(index, value)| (value, index))
.collect();
Self {
index_to_value,
value_to_index,
}
}
fn get(&self, n: usize) -> Option<i32> {
self.index_to_value.get(n).copied()
}
fn index_of(&self, i: i32) -> Option<usize> {
self.value_to_index.get(&i).copied()
}
}
fn main() {
let list = List::new(vec![5, 2, 0, 4, 1, 3]);
assert_eq!(list.get(2), Some(0));
assert_eq!(list.get(3), Some(4));
assert_eq!(list.index_of(0), Some(2));
assert_eq!(list.index_of(4), Some(3));
}
Related
I have a struct called Cell
pub struct Cell {
x: X, // Some other struct
y: Y, // Some other struct
weight: usize,
}
I was trying to select the top preference cell out of some Row (a collection of Cells).
// Return the top n-matching cells with a positive weight
pub fn select_preference(&mut self) -> Vec<Cell> {
let top = 3;
self.sort();
// After sorting, omit the cells with weight = 0
// And select the top preference cells
self.cells.split(|cell| cell.weight() == 0).take(top)
}
However, I am getting an expected error actually:
Compiling playground v0.0.1 (/playground)
error[E0308]: mismatched types
--> src/lib.rs:35:9
|
29 | pub fn select_preference(&mut self) -> Vec<Cell> {
| --------- expected `Vec<Cell>` because of return type
...
35 | self.cells.split(|cell| cell.weight() == 0).take(top)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `Vec`, found struct `std::iter::Take`
|
= note: expected struct `Vec<Cell>`
found struct `std::iter::Take<std::slice::Split<'_, Cell, [closure#src/lib.rs:35:26: 35:32]>>`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `playground` due to previous error
I don't know how to convert the Take into Vec<Cell> or &[Cell]. I know the Take is some sort of Iterator but unable to convert it :>
Rust Playground
Returning a vector of references to the relevant cells is, I think, the most idiomatic way to do this as it allows using iterators. You can then write:
pub fn select_preference(&mut self) -> Vec<&Cell> {
let top = 3;
self.sort();
self.cells.iter().filter(|cell| cell.weight() != 0).take(top).collect()
}
You can even sort only the iterator and not the Vec<Cell> itself.
Returning a slice is difficult as a slice must always refer to a contiguous part of a sequence. This will always be the case here due to the sorting of cells, however the iterator methods don't take this into account and so cannot be used. One way you could do it is:
pub fn select_preference(&mut self) -> &[Cell] {
let mut top = 3;
self.sort();
let mut ret = &self.cells[..cmp::min(self.cells.len(), top)];
while ret[ret.len() - 1].weight() == 0 {
ret = &ret[..ret.len() - 1];
}
ret
}
But it is probably evident that this is not very idiomatic Rust.
First, split is probably not what you want -- that creates an iterator where each element is a block of nonzero items. You probably want .iter().filter(|cell| cell.weight() != 0): iterate over elements of the vector, then filter out those that are nonzero.
To return a vector from an iterator, you need .collect(). However, this would give a Vec<&Cell> -- which doesn't quite match your function signature. Since you want a Vec<Cell>, you also need to clone the elements first to get new cells -- so you can use .cloned() first. That requires adding #[derive(Clone)] to Cell. This is the end result:
#[derive(Clone)]
pub struct Cell {
x: X,
y: Y,
weight: usize,
}
// Return the top n-matching cells with a positive weight
pub fn select_preference(&mut self) -> Vec<Cell> {
let top = 3;
self.sort();
// After sorting, omit the cells with weight = 0
// And select the top preference cells
self.cells.iter().filter(|cell| cell.weight() != 0).take(top).cloned().collect()
}
As a general rule, it's common to always derive Clone for structs of data.
Other designs are possible too -- you can return the Vec<&Cell> directly, as the other answer suggests. Finally, you could return an iterator instead of a Vec; here's how that looks:
pub fn select_preference(&mut self) -> impl Iterator<Item = &Cell> {
let top = 3;
self.sort();
// After sorting, omit the cells with weight = 0
// And select the top preference cells
self.cells.iter().filter(|cell| cell.weight() != 0).take(top)
}
When manipulating a Vector of Futures, I end up with a nested Vector of Vectors, which I then need to flatten in two iterations.
Dummy code for illustrative purposes:
use std::error::Error;
use futures::future::join_all;
#[tokio::main]
async fn main() {
async fn duplicate(number: i32) -> Result<Vec<i32>, Box<dyn Error>> {
Ok(vec!(number * 2))
}
let my_numbers = vec!(1, 2, 3, 4, 5);
let future_duplicated_evens = my_numbers.into_iter().filter_map(|number| {
if number % 2 == 0 {
Some(duplicate(number))
} else {
None
}
}).collect::<Vec<_>>();
let flattened = join_all(future_duplicated_evens).await.into_iter().collect::<Result<Vec<_>, Box<dyn Error>>>().unwrap().into_iter().flatten().collect::<Vec<i32>>();
println!("Flattened: {:?}", flattened);
}
Link to the playground.
In the sample code above, for flattened I first need to collect the Vector of Results from the joined Futures into a Result of a Vec, and after that I need to iterate again just to flatten the Vectors.
My question is, is there a way to collect and flatten in a single iteration?
Since in your example you don't care about handling the errors cleanly, you can simply apply Result::unwrap to each element using map (playground):
let flattened = join_all(future_duplicated_evens)
.await
.into_iter()
.map(Result::unwrap)
.flatten()
.collect::<Vec<i32>>();
If in your real code you do care about handling errors, you can use try_fold instead which will accumulate all of values in a single vector but abort if it comes across an error (playground):
let flattened = join_all(future_duplicated_evens)
.await
.into_iter()
.try_fold(Vec::new(), |mut acc, next| {
acc.extend_from_slice(&next?);
Ok::<Vec<i32>, Box<dyn Error>>(acc)
})
.unwrap();
I have a struct containing a two-dimensional grid represented by a single Vec<u8> because wasm_bindgen does not support <Vec<Vec<T>>. For example, the grid:
0 1
2 3
is stored as a Vec<u8> with elements [0, 1, 2, 3] (row-major order).
I want to be able to resize the grid's width; if the new width is smaller the grid should remove columns from the right, if the new width is larger the grid should fill new columns with zeros. Items may have to be added or removed at multiple locations within the Vec.
To set the grid's width I am chunking the Vec, turning the chunks into vectors, resizing the vectors, and flattening the vectors.
struct Matrix {
grid: Vec<u8>,
width: usize,
height: usize,
}
impl Matrix {
pub fn set_width(&mut self, new_width: usize) {
self.grid = self
.grid
.chunks_exact(self.width)
.flat_map(|chunk| {
let mut chunk_vec = chunk.to_vec();
chunk_vec.resize(new_width, 0);
chunk_vec
})
.collect();
self.width = new_width;
}
}
Is there a more efficient way to do this? I think the chunks are probably allocating a lot of memory on large grid sizes as they all get turned into Vecs.
Setting the height is much easier as the Vec will only need to be extended or truncated:
pub fn set_height(&mut self, new_height: usize) {
self.grid.resize(self.width * new_height, 0);
self.height = new_height;
}
To simply reduce the number of allocations, you can make the closure passed to flat_map return an iterator instead of a Vec:
pub fn set_width(&mut self, new_width: usize) {
use std::iter::repeat;
self.grid = self
.grid
.chunks_exact(self.width)
.flat_map(|chunk| chunk.iter().copied().chain(repeat(0)).take(new_width))
.collect();
self.width = new_width;
}
That is, for each chunk, create an iterator that yields the copied contents of the chunk followed by a repeated string of 0s, and truncate it (take) to total size new_width. This does not require creating any Vecs to store intermediate results and so it allocates less... most likely.
This is okay, but it could be better. FlatMap can't know the size of the internal iterators, so it doesn't give a useful size_hint (see Efficiency of flattening and collecting slices for a similar example). This means the Vec in the solution above starts empty and may have to be grown (reallocated and its contents copied) several times before it is large enough. Instead, we can use Vec::with_capacity first to reserve the correct amount of space, and extend the vector instead of collecting into it:
pub fn set_width(&mut self, new_width: usize) {
use std::iter::repeat;
let mut new_grid = Vec::with_capacity(self.grid.len() / self.width * new_width);
for chunk in self.grid.chunks_exact(self.width) {
new_grid.extend(chunk.iter().copied().chain(repeat(0)).take(new_width));
}
self.grid = new_grid;
self.width = new_width;
}
It is also possible to resize the grid in-place, with at most one reallocation (often reusing the existing one). However, that algorithm is significantly more complicated. The above is how I would write set_width unless it were proven to be a bottleneck.
Is the order of the grid points relevant for you? If not, I would use a different serialization from 2D to 1D:
Given you have a matrix like this:
1 2 5
3 4 6
7 8 9
So you if the matrix gets wider or higher you don't move the indices of the smaller positions at all, but just append the new entries as new “layers” around the matrix you already have.
You could serialize this to [1, 2, 3, 4, 5, 6, 7, 8, 9]
Assuming all indices, and coordinates start at 0:
Given you want to access (n, m) you find the “layer” the matrix value is in by calculating max(n, m). The n-th “layer” will start at index position n * n. Within the layer you find the first n elements in the part added on the right side, and the following n+1 elements in the row added on the bottom.
Gave a shot at resizing the grid's width in-place, only reserving new memory once when new_width > self.width:
use std::{cmp::Ordering, iter};
pub fn set_width(&mut self, new_width: usize) {
match new_width.cmp(&self.width) {
Ordering::Greater => {
let width_diff = new_width - self.width;
self.grid.reserve_exact(width_diff * self.height);
for _ in 0..self.height {
self.grid.extend(iter::repeat(0).take(width_diff));
self.grid.rotate_right(new_width);
}
}
Ordering::Less => {
let width_diff = self.width - new_width;
for _ in 0..self.height {
self.grid.truncate(self.grid.len() - width_diff);
self.grid.rotate_right(new_width);
}
}
Ordering::Equal => (),
}
self.width = new_width;
}
I was considering iterating over the Vec's reversed rows and using splice to insert/remove values, but I'm not sure if it's any more efficient.
Using splice:
use std::{cmp::Ordering, iter};
pub fn set_width(&mut self, new_width: usize) {
match new_width.cmp(&self.width) {
Ordering::Greater => {
let width_diff = new_width - self.width;
let width = self.width;
self.grid.reserve_exact(width_diff * self.height);
for i in (0..self.height).rev().map(|n| n * width + width) {
self.grid.splice(i..i, iter::repeat(0).take(width_diff));
}
}
Ordering::Less => {
let width_diff = self.width - new_width;
let width = self.width;
for (start, end) in (1..=self.height)
.rev()
.map(|n| (n * width - width_diff, n * width))
{
self.grid.splice(start..end, iter::empty());
}
}
Ordering::Equal => (),
}
self.width = new_width;
}
I want to be able to repeat a process where a collection that we are iterating over is altered an n number of times. n is only known at runtime, and can be specified by the user, so we cannot hard-code it into the type.
An approach that uses intermediate data structures by collect-ing between iterations is possible, like so:
let n = 10;
let mut vec1 = vec![1, 2, 3];
{
for _index in 0..n {
let temp_vec = vec1.into_iter().flat_map(|x| vec![x, x * 2]).collect();
vec1 = temp_vec;
}
}
However, this seems wasteful, because we are creating intermediate datastructures, so I went on looking for a solution that chains iterators directly.
At first I thought one could just do something like:
let mut iter = vec![1, 2, 3].into_iter();
for index in 0..n {
iter = iter.flat_map(|x| vec![x, x * 2].into_iter());
}
However, this does not work because in Rust, all functions on iterators return their own kind of 'compound iterator' struct. (In for instance Haskell, functions on iterators return the appropriate kind of result iterator, which does not become a 'bigger and bigger compound type'.)
Rewriting this as a recursive function had similar problems because (a) I was returning 'some kind of Iterator' whose type was (near?)-impossible to write out by hand because of the recursion, and (b) this type was different in the base case from the recursive case.
I found this question about conditionally returning either one or the other iterator type, as well as using impl Iterator to indicate that we return some concrete type that implements the Iterator trait, but we do not care about its exact nature.
A similar example to the code in the linked answer has been implemented in the code below as maybe_flatmap. This works.
However, I don't want to run flat_map zero or one time, but rather N times on the incoming iterator. Therefore, I adapted the code to call itself recursively up to a depth of N.
Attempting to do that, then makes the Rust compiler complain with an error[E0720]: opaque type expands to a recursive type:
use either::Either; // 1.5.3
/// Later we want to work with any appropriate items,
/// but for simplicity's sake, just use plain integers for now.
type I = u64;
/// Works, but limited to single level.
fn maybe_flatmap<T: Iterator<Item = I>>(iter: T, flag: bool) -> impl Iterator<Item = I> {
match flag {
false => Either::Left(iter),
true => Either::Right(iter.flat_map(move |x| vec![x, x * 2].into_iter())),
}
}
/// Does not work: opaque type expands to a recursive type!
fn rec_flatmap<T: Iterator<Item = I>>(iter: T, depth: usize) -> impl Iterator<Item = I> {
match depth {
0 => Either::Left(iter),
_ => {
let iter2 = iter.flat_map(move |x| vec![x, x * 2]).into_iter();
Either::Right(rec_flatmap(iter2, depth - 1))
}
}
}
fn main() {
let xs = vec![1, 2, 3, 4];
let xs2 = xs.into_iter();
let xs3 = maybe_flatmap(xs2, true);
let xs4: Vec<_> = xs3.collect();
println!("{:?}", xs4);
let ys = vec![1, 2, 3, 4];
let ys2 = ys.into_iter();
let ys3 = rec_flatmap(ys2, 5);
let ys4: Vec<_> = ys3.collect();
println!("{:?}", ys4);
}
Rust playground
error[E0720]: opaque type expands to a recursive type
--> src/main.rs:16:65
|
16 | fn rec_flatmap<T: Iterator<Item = I>>(iter: T, depth: usize) -> impl Iterator<Item = I> {
| ^^^^^^^^^^^^^^^^^^^^^^^ expands to a recursive type
|
= note: expanded type is `either::Either<T, impl std::iter::Iterator>`
I am stuck.
Since regardless of how often you flat_map, the final answer is going to be an (iterator over) a vector of integers, it seems like there ought to be a way of writing this function using only a single concrete return type.
Is this possible? Is there a way out of this situation without resorting to runtime polymorphism?
I believe/hope that a solution without dynamic polymorphism (trait objects or the like) is possible because regardless of how often you call flat_map the end result should have (at least morally) have the same type. I hope there is a way to shoehorn the (non-matching) nested FlatMap struct in a matching single static type somehow.
Is there a way to resolve this without runtime polymorphism?
No.
To solve it using a trait object:
let mut iter: Box<dyn Iterator<Item = i32>> = Box::new(vec![1, 2, 3].into_iter());
for _ in 0..n {
iter = Box::new(iter.flat_map(|x| vec![x, x * 2].into_iter()));
}
regardless of how often you call flat_map the end result should have (at least morally) have the same type
I don't know which morality to apply to type systems, but the literal size in memory is (very likely to be) different for FlatMap<...> and FlatMap<FlatMap<...>>. They are different types.
See also:
Conditionally iterate over one of several possible iterators
Creating Diesel.rs queries with a dynamic number of .and()'s
How do I iterate over a Vec of functions returning Futures in Rust?
How can I extend the lifetime of a temporary variable inside of an iterator adaptor in Rust?
Why does Iterator::take_while take ownership of the iterator?
To create a new vector with the contents of other vectors, I'm currently doing this:
fn func(a: &Vec<i32>, b: &Vec<i32>, c: &Vec<i32>) {
let abc = Vec<i32> = {
let mut tmp = Vec::with_capacity(a.len(), b.len(), c.len());
tmp.extend(a);
tmp.extend(b);
tmp.extend(c);
tmp
};
// ...
}
Is there a more straightforward / elegant way to do this?
There is a concat method that can be used for this, however the values need to be slices, or borrowable to slices, not &Vec<_> as given in the question.
An example, similar to the question:
fn func(a: &Vec<i32>, b: &Vec<i32>, c: &Vec<i32>) {
let abc = Vec<i32> = [a.as_slice(), b.as_slice(), c.as_slice()].concat();
// ...
}
However, as #mindTree notes, using &[i32] type arguments is more idiomatic and removes the need for conversion. eg:
fn func(a: &[i32], b: &[i32], c: &[i32]) {
let abc = Vec<i32> = [a, b, c].concat();
// ...
}
SliceConcatExt::concat is a more general version of your function and can join multiple slices to a Vec. It will sum the sizes each slice to pre-allocate a Vec of the right capacity, then extend repeatedly.
fn concat(&self) -> Vec<T> {
let size = self.iter().fold(0, |acc, v| acc + v.borrow().len());
let mut result = Vec::with_capacity(size);
for v in self {
result.extend_from_slice(v.borrow())
}
result
}
One possible solution might be to use the Chain iterator:
let abc: Vec<_> = a.iter().chain(b).chain(c).collect();
However, in your example you are borrowing the slices, so we'll need to either deref each borrowed element or use the Cloned iterator to copy each integer. Cloned is probably a bit easier and as efficient as we are working with small Copy data (i32):
let abc: Vec<_> = a.iter().cloned()
.chain(b.iter().cloned())
.chain(c.iter().cloned())
.collect();
Seeing as each of these iterators are ExactSizeIterators, it should be possible to allocate the exact size for the target Vec up front, however I'm unware whether or not this is actually the case in the std implementation (they might be waiting on specialization to land before adding this optimisation).