How to convert a collection of Vec<ndarray::Array1> into an Array2? - multidimensional-array

I'm trying to create a 2D array from a Vec of 1D arrays using the ndarray crate. In the current implementation, I have Vec<Array1<u32>> as the collection of 1D arrays, and I'm having a hard time figuring out how to convert it to Array2<u32>. I've tried from_vec() on Vec<Array1<u32>> but it yielded Array1<Array1<u32>>. I thought of using the stack! macro, but I'm not sure how to call it on the above Vec. I'm using ndarray 0.12.1 and Rust 1.31.0.

I'm not hugely familiar with ndarray, but it looks like you have to flatten the data as an intermediate step and then rebuild from that. An iterator would probably have been more efficient but I don't see a method to build from an iterator that also lets you specify a shape.
It likely isn't the most performant way to to this, but it does at least work:
fn to_array2<T: Copy>(source: &[Array1<T>]) -> Result<Array2<T>, impl std::error::Error> {
let width = source.len();
let flattened: Array1<T> = source.into_iter().flat_map(|row| row.to_vec()).collect();
let height = flattened.len() / width;
flattened.into_shape((width, height))
}
Note that it can fail if the source arrays has different lengths. This solution is not 100% robust because it won't fail if one array is smaller but compensated by another array being longer. It is probably worth adding a check in there to prevent that, but I'll leave that to you.

Related

Rust reshape a vec into vec<vec>

let a: Vec<f32>;
let mut new: Vec<Vec<f32>>;
Assume that a has a size of n * n. How can I convert it into a 2D vector new?
Obviously, it would be very naive to simply iterate over the vector and do it by hand. Is there any way to do a quick and performant reshape?
No. If you're coming from numpy or similar tooling, numpy stores all arrays as one-dimensional and does arithmetic to make it look multi-dimensional, so performantly reshaping an array really does just involve changing an index somewhere. But a Vec in Rust is a one-dimensional structure, and a Vec<Vec<...>> is a nested datatype with a completely different structure, so you will actually have to copy all of the elements to the new vector. Iterating over the elements is the right way, in this case.
I applaud your efforts to find a better way, but in this case I do believe the answer is to just do it yourself.

How do I trim a slice and get the indices (indexes) of the result?

This seems like a simple problem:
let slice = " some wacky text. ";
let trimmed = slice.trim();
// how do I get the index of the start and end within the original slice?
Attempt 1
Look for an alternative API. trim wraps trim_matches which deals with indices internally anyway: so lets copy this code! But this uses std::str::pattern::Pattern which is unstable, thus can't be used outside std in stable Rust.
Attempt 2
Just use trim and calculate the slice indices from the pointers. There's a nice as_ptr_range method, but its also unstable; luckily as the PR says there's an easy work-around.
let slice_ptr = slice.as_ptr();
let trimmed_ptr = trimmed.as_ptr();
// don't bother about the end (we can use trimmed.len())
Now that we've got some pointers, we need their difference. sub is not the right method for this. offset_from is, but it's unstable (as noted in the design, it's only valid use is to compare two pointers into the same slice, which is exactly what we want to do, unfortunately it's yet another thing delayed by the details).
Now, there are hackier ways of solving this problem. We could transmute the pointers to usize (we know the element size is 1 byte, so no need to multiply). But this is most likely the Undefined Behaviour type of unsafe, so lets not go there.
Attempt 3
Edit: the source problem is easy to solve directly, so probably the answer in this case is roll-my-own. Possibly I should just close this.

Guidelines for choosing Array of Struct or Multiple Arrays

This question has been asked for other languages. I would like to ask this in relation to Julia.
What are the general guidelines for choosing between an array of struct e.g.
struct vertex
x::Real
y::Real
gradient_x::Real
gradient_y::Real
end
myarray::Array{Vertex}
and multiple arrays.
xpositions::Array{<:Real}
ypositions::Array{<:Real}
gradient_x::Array{<:Real}
gradient_y::Array{<:Real}
Are there any performance considerations? Or is it just a style/readability issue.
Your struct as it currently stands will perform poorly. From the Performance Tips you should always:
Avoid fields with abstract type
Similarly, you should always prefer Vector{<:Real} to Vector{Real}.
The Julian way to approach this is to parameterize your struct as follows:
struct Vertex{T<:Real}
x::T
y::T
gradient_x::T
gradient_y::T
end
Given the above, the two approaches discussed in the question will now have roughly similar performance. In practice, it really depends on what kind of operations you want to perform. For example, if you frequently need a vector of just the x fields, then having multiple arrays will probably be a better approach, since any time you need a vector of x fields from a Vector{Vertex} you will need to loop over the structs to allocate it:
xvec = [ v.x for v in vertexvec ]
On the other hand, if your application lends itself to functions called over all four fields of the struct, then your code will be significantly cleaner if you use a Vector{Vertex} and will be just as performant as looping over 4 arrays. Broadcasting in particular will make for nice clean code here. For example, if you have some function:
f(x, y, gradient_x, gradient_y)
then just add the method:
f(v::Vertex) = f(v.x, v.y, v.gradient_x, v.gradient_y)
Now if you want to apply it to vv::Vector{Vertex}, you can just use:
f.(vv)
Remember, user-defined types in Julia are just as performant as "in-built" types. In fact, many types that you might think of as in-built are just defined in Julia itself, much as you are doing here.
So the short summary is: both approaches are performant, so use whichever makes more sense in the context of your application.

Convert Vec<T> to Vec<&T> [duplicate]

I can convert Vec<String> to Vec<&str> this way:
let mut items = Vec::<&str>::new();
for item in &another_items {
items.push(item);
}
Are there better alternatives?
There are quite a few ways to do it, some have disadvantages, others simply are more readable to some people.
This dereferences s (which is of type &String) to a String "right hand side reference", which is then dereferenced through the Deref trait to a str "right hand side reference" and then turned back into a &str. This is something that is very commonly seen in the compiler, and I therefor consider it idiomatic.
let v2: Vec<&str> = v.iter().map(|s| &**s).collect();
Here the deref function of the Deref trait is passed to the map function. It's pretty neat but requires useing the trait or giving the full path.
let v3: Vec<&str> = v.iter().map(std::ops::Deref::deref).collect();
This uses coercion syntax.
let v4: Vec<&str> = v.iter().map(|s| s as &str).collect();
This takes a RangeFull slice of the String (just a slice into the entire String) and takes a reference to it. It's ugly in my opinion.
let v5: Vec<&str> = v.iter().map(|s| &s[..]).collect();
This is uses coercions to convert a &String into a &str. Can also be replaced by a s: &str expression in the future.
let v6: Vec<&str> = v.iter().map(|s| { let s: &str = s; s }).collect();
The following (thanks #huon-dbaupp) uses the AsRef trait, which solely exists to map from owned types to their respective borrowed type. There's two ways to use it, and again, prettiness of either version is entirely subjective.
let v7: Vec<&str> = v.iter().map(|s| s.as_ref()).collect();
and
let v8: Vec<&str> = v.iter().map(AsRef::as_ref).collect();
My bottom line is use the v8 solution since it most explicitly expresses what you want.
The other answers simply work. I just want to point out that if you are trying to convert the Vec<String> into a Vec<&str> only to pass it to a function taking Vec<&str> as argument, consider revising the function signature as:
fn my_func<T: AsRef<str>>(list: &[T]) { ... }
instead of:
fn my_func(list: &Vec<&str>) { ... }
As pointed out by this question: Function taking both owned and non-owned string collections. In this way both vectors simply work without the need of conversions.
All of the answers idiomatically use iterators and collecting instead of a loop, but do not explain why this is better.
In your loop, you first create an empty vector and then push into it. Rust makes no guarantees about the strategy it uses for growing factors, but I believe the current strategy is that whenever the capacity is exceeded, the vector capacity is doubled. If the original vector had a length of 20, that would be one allocation, and 5 reallocations.
Iterating from a vector produces an iterator that has a "size hint". In this case, the iterator implements ExactSizeIterator so it knows exactly how many elements it will return. map retains this and collect takes advantage of this by allocating enough space in one go for an ExactSizeIterator.
You can also manually do this with:
let mut items = Vec::<&str>::with_capacity(another_items.len());
for item in &another_items {
items.push(item);
}
Heap allocations and reallocations are probably the most expensive part of this entire thing by far; far more expensive than taking references or writing or pushing to a vector when no new heap allocation is involved. It wouldn't surprise me if pushing a thousand elements onto a vector allocated for that length in one go were faster than pushing 5 elements that required 2 reallocations and one allocation in the process.
Another unsung advantage is that using the methods with collect do not store in a mutable variable which one should not use if it's unneeded.
another_items.iter().map(|item| item.deref()).collect::<Vec<&str>>()
To use deref() you must add using use std::ops::Deref
This one uses collect:
let strs: Vec<&str> = another_items.iter().map(|s| s as &str).collect();
Here is another option:
use std::iter::FromIterator;
let v = Vec::from_iter(v.iter().map(String::as_str));
Note that String::as_str is stable since Rust 1.7.

What's the idiomatic way to append a slice to a vector?

I have a slice of &[u8] and I'd like to append it to a Vec<u8> with minimal copying. Here are two approaches that I know work:
let s = [0u8, 1u8, 2u8];
let mut v = Vec::new();
v.extend(s.iter().map(|&i| i));
v.extend(s.to_vec().into_iter()); // allocates an extra copy of the slice
Is there a better way to do this in Rust stable? (rustc 1.0.0-beta.2)
There's a method that does exactly this: Vec::extend_from_slice
Example:
let s = [0u8, 1, 2];
let mut v = Vec::new();
v.extend_from_slice(&s);
v.extend(s.iter().cloned());
That is effectively equivalent to using .map(|&i| i) and it does minimal copying.
The problem is that you absolutely cannot avoid copying in this case. You cannot simply move the values because a slice does not own its contents, thus it can only take a copy.
Now, that said, there are two things to consider:
Rust tends to inline rather aggressively; there is enough information in this code for the compiler to just copy the values directly into the destination without any intermediate step.
Closures in Rust aren't like closures in most other languages: they don't require heap allocation and can be directly inlined, thus making them no less efficient than hard-coding the behaviour directly.
Do keep in mind that the above two are dependent on optimisation: they'll generally work out for the best, but aren't guaranteed.
But having said that... what you're actually trying to do here in this specific example is append a stack-allocated array which you do own. I'm not aware of any library code that can actually take advantage of this fact (support for array values is rather weak in Rust at the moment), but theoretically, you could effectively create an into_iter() equivalent using unsafe code... but I don't recommend it, and it's probably not worth the hassle.
I can't speak for the full performance implications, but v + &s will work on beta, which I believe is just similar to pushing each value onto the original Vec.

Resources