Removing any element from an associative array - associative-array

I'd like to remove an(y) element from an associative array and process it.
Currently I'm using a RedBlackTree together with .removeAny(), but I don't need the data to be in any order. I could use .byKey() on the AA, but that always produces an array with all keys. I only need one at a time and will probably change the AA while processing every other element. Is there any other smart way to get exactly one key without (internally) traversing the whole data structure?

There is a workaround, which works as well as using .byKeys():
auto anyKey(K, V)(inout ref V[K] aa)
{
foreach (K k, ref inout(V) v; aa)
return k;
assert(0, "Associative array hasn't any keys.");
}
For my needs, .byKeys().front seems to be fast enough though. Not sure if the workaround is actually faster.

Related

Parallel iteration over array with step size greater than 1

I'm working on a practice program for doing belief propagation stereo vision. The relevant aspect of that here is that I have a fairly long array representing every pixel in an image, and want to carry out an operation on every second entry in the array at each iteration of a for loop - first one half of the entries, and then at the next iteration the other half (this comes from an optimisation described by Felzenswalb & Huttenlocher in their 2006 paper 'Efficient belief propagation for early vision'.) So, you could see it as having an outer for loop which runs a number of times, and for each iteration of that loop I iterate over half of the entries in the array.
I would like to parallelise the operation of iterating over the array like this, since I believe it would be thread-safe to do so, and of course potentially faster. The operation involved updates values inside the data structures representing the neighbouring pixels, which are not themselves used in a given iteration of the outer loop. Originally I just iterated over the entire array in one go, which meant that it was fairly trivial to carry this out - all I needed to do was put .Parallel between Array and .iteri. Changing to operating on every second array entry is trickier, however.
To make the change from simply iterating over every entry, I from Array.iteri (fun i p -> ... to using for i in startIndex..2..(ArrayLength - 1) do, where startIndex is either 1 or 0 depending on which one I used last (controlled by toggling a boolean). This means though that I can't simply use the really nice .Parallel to make things run in parallel.
I haven't been able to find anything specific about how to implement a parallel for loop in .NET which has a step size greater than 1. The best I could find was a paragraph in an old MSDN document on parallel programming in .NET, but that paragraph only makes a vague statement about transforming an index inside a loop body. I do not understand what is meant there.
I looked at Parallel.For and Parallel.ForEach, as well as creating a custom partitioner, but none of those seemed to include options for changing the step size.
The other option that occurred to me was to use a sequence expression such as
let getOddOrEvenArrayEntries myarray oddOrEven =
seq {
let startingIndex =
if oddOrEven then
1
else
0
for i in startingIndex..2..(Array.length myarray- 1) do
yield (i, myarray.[i])
}
and then using PSeq.iteri from ParallelSeq, but I'm not sure whether it will work correctly with .NET Core 2.2. (Note that, currently at least, I need to know the index of the given element in the array, as it is used as the index into another array during the processing).
How can I go about iterating over every second element of an array in parallel? I.e. iterating over an array using a step size greater than 1?
You could try PSeq.mapi which provides not only a sequence item as a parameter but also the index of an item.
Here's a small example
let res = nums
|> PSeq.mapi(fun index item -> if index % 2 = 0 then item else item + 1)
You can also have a look over this sampling snippet. Just be sure to substitute Seq with PSeq

How to convert a collection of Vec<ndarray::Array1> into an Array2?

I'm trying to create a 2D array from a Vec of 1D arrays using the ndarray crate. In the current implementation, I have Vec<Array1<u32>> as the collection of 1D arrays, and I'm having a hard time figuring out how to convert it to Array2<u32>. I've tried from_vec() on Vec<Array1<u32>> but it yielded Array1<Array1<u32>>. I thought of using the stack! macro, but I'm not sure how to call it on the above Vec. I'm using ndarray 0.12.1 and Rust 1.31.0.
I'm not hugely familiar with ndarray, but it looks like you have to flatten the data as an intermediate step and then rebuild from that. An iterator would probably have been more efficient but I don't see a method to build from an iterator that also lets you specify a shape.
It likely isn't the most performant way to to this, but it does at least work:
fn to_array2<T: Copy>(source: &[Array1<T>]) -> Result<Array2<T>, impl std::error::Error> {
let width = source.len();
let flattened: Array1<T> = source.into_iter().flat_map(|row| row.to_vec()).collect();
let height = flattened.len() / width;
flattened.into_shape((width, height))
}
Note that it can fail if the source arrays has different lengths. This solution is not 100% robust because it won't fail if one array is smaller but compensated by another array being longer. It is probably worth adding a check in there to prevent that, but I'll leave that to you.

C++ doubling index vector of unique ordered relationships

I'm looking for an std style container, i.e. with iterators and such with a
structure along the lines of:
template <hashable T, hasable U> class relationships {
relation(std::[vector or list]<std::pair<T,U>> list);
const std::pair<T,U>& operator [](const T index);
const std::pair<T,U>& operator [](const U index);
}
This is for a two way mapping, into a order list of pairs, every value of both T and U are unique, and both are hashable, and the pairs of related T and U have a specific ordering to them, that should be reproduce by the following loop
for (auto it : relationships) {
// do something with it
}
would be equivalent to
for (auto it : list) {
// do something with it
}
I also want efficient lookup i.e. operator [], should be equivalent to an std::unorderd_map for both types.
Finally I'm look for solutions based around the Standard Library using C++14 and DO NOT WANT TO USE BOOST.
I seen how to implement a Hash map previously using binary search trees, however I looking for insight in how to efficiently maintain the structure for two indexes plus ordered elements, or existing solutions if one exists:
my current idea is something using nodes along the line of
template <typename T, typename U> struct node {
std::pair<T, U> value; // actual value
// hashs for sorting binary trees
size_t hashT;
size_t hashU;
// linked list for ordering
node * prevL;
node * nextL;
// binary search tree for type T lookup
node * parentT;
node * prevT;
node * nextT;
// binary search tree for type U lookup
node * parentU;
node * prevU;
node * nextU;
}
However that seams inefficient
my other idea is to store a vector or values, which has order, and then two sorted index vectors of std::pair<size_t, size_t> with first being the hash, and second the index, however how should I deal with performing a binary search on the index vector and handle hash collisions. I believe this solution would be more memory efficient and similar speed, but not sure on all the implementation details.
EDIT: I don't need fast insertions, just lookup and iteration, the mapping is would be generated once and then used to find relationships.
Regarding performance it all depends on the algorithm and the type of T and U you are trying to use . If you build your data and then do not change it, a simple solution would be the following:
Use a vector<pair<T,U>> for constructing your data
duplicate this vector
sort one vector according to T, one according to U
use binary search for fast lookup, either in the first vector if looking by T, or in the second if looking by U
hide all this behind a construct/sort/access interface. You might not want to use operator[] since you are expected to look into your data structure only once sorted
Of course the solution is not perfect in the sense that you are copying data. However, remember that you will have no extra hidden allocation as you would with a hashmap. For example, for T = U = int, I would think that there will be no more memory in use than a std::unordered_map, since each node needs to store a pointer.

Pushing element at back of vec in armadillo

How can I push an element at the end of vector in vec of armadillo? I am performing adding and removing an element in a sorted list in a loop. This is very expensive thing. The way I am currently doing in case of removing an element from a vec x to vec x_curr as:
x_curr = x(find(x != element))
However its not trivial in case of adding an element in loop.
x_curr = x; x_curr << element; x_curr = sort(x_curr);
This not correct. In addition not very efficient. What would be most efficient way to do this in armadillo. Any other STL library solution. I am using this in Rcpp armadillo. I can perhaps sorting every loop. x_curr is used to store of indices of column of arma::mat i.e. I am going to use it as mat.col(x_curr).
I don't understand your question.
Armadillo is a math library, so it operates on vectors. If you do not know your size, you could allocate a guessed N elements and resize in the common 'times two' idiom as needed, and shrink at the end. If you know the size, well then you have no problem.
The STL has the so-called generic containers and algorithms, but it does not do linear algebra. You need to figure out what you need most, and plan your implementation accordingly.
I am not sure that I understood what you want to do,
but if you want to append an element at the end of your vector,
you can do it like this:
int sz = yourvector.size();
yourvector.resize(sz+1);
yourvector(sz) = element;

How does change of the state work under the hood in functional languages

I would like to know how could functional languages implement "under the hood" creation of new state of for example Vector. When I have a Vector and I add another element to that particular Vector the old one is still there unchanged and new Vector containing the old one is created containing one more element.
How is this handled internally? Could someone try to explain it? Thank you.
Conceptually, a new Vector is created each time the Vector is extended or modified. However, since the original Vector is unmodified, clever techniques may be used to share structure. See ropes for example.
Also see Okasaki's Purely Functional Data Structures.
If you prepend an element to a linked list, a new linked list is created with the new element as its head and a pointer to the old list as its tail.
If you add an item to an array, the whole array is usually copied (making it quite inefficient to build up an immutable array incrementally).
However if you only add to the end of each array once, like so:
arr1 = emptyArray()
arr2 = add(arr1, 1)
arr3 = add(arr2, 2)
arr4 = add(arr3, 3)
The whole thing could be optimized, so that arr1, arr2, arr3 and arr4, all have pointers to the same memory, but different lengths. Of course this optimization can only be performed the first time you add to any given array. If you have arr5 = add(arr4, 4) and arr5prime = add(arr4, 42) at least one of them needs to be a copy.
Note that this isn't a common optimization, so you shouldn't expect it to be there unless explicitly stated in the documentation.

Resources