How to best keep the first N elements in Vec and release unused capacity? - vector

I want to keep only the first 2 elements in a Vec and release any unused capacity. Here is my current solution:
let mut data = vec![1, 2, 3, 4, 5, 6]; // produced by another function
data.truncate(2);
data.shrink_to_fit();
Is there a better way to do this?

Truncating and shrinking is the best way. Releasing unused capacity is a distinct operation; there's no way around it. Rust doesn't do it automatically since you might be removing and then adding more elements.

Rust docs https://static.rust-lang.org/doc/master/std/vec/struct.Vec.html#guarantees
In general, Vec's allocation details are subtle enough that it is strongly recommended that you only free memory allocated by a Vec by creating a new Vec and dropping it.
I'm a Rust noob, but it seems to say that the solution would be:
let v = vec![v[0], v[1]];
(or vec![&v[0], &v[1]] if appropriate);
BTW. https://static.rust-lang.org/doc/master/std/vec/struct.Vec.html#guarantees also says:
push and insert will never (re)allocate if the reported capacity is sufficient. push and insert will (re)allocate if len()==capacity(). That is, the reported capacity is completely accurate, and can be relied on. It can even be used to manually free the memory allocated by a Vec if desired.
I don't understand how to use this information :)

Related

Rf_allocVector only allocates and does not zero out memory

Original motivation behind this is that I have a dynamically sized array of floats that I want to pass to R through Rcpp without either incurring the cost of a zeroing out nor the cost of a deep copy.
Originally I had thought that there might be some way to take heap allocated array, make it aware to R's gc system and then wrap it with other data to create a "Rcpp::NumericVector" but it seems like that that's not possible - or doable with my current knowledge.
However and correct me if I'm wrong it looks like simply constructing a NumericVector with a size N and then using it as an N sized allocation will call R.h's Rf_allocVector and that itself does not either zero out the allocated array - I tested it on a small C program that gets dyn.loaded into R and it looks like garbage values. I also took a peek at the assembly and there doesn't seem to be any zeroing out.
Can anyone confirm this or offer any alternate solution?
Welcome to StackOverflow.
You marked this rcpp but that is a function from the C API of R -- whereas the Rcpp API offers you its constructors which do in fact set the memory tp zero:
> Rcpp::cppFunction("NumericVector goodVec(int n) { return NumericVector(n); }")
> sum(goodVec(1e7))
[1] 0
>
This creates a dynamically allocated vector using R's memory functions. The vector is indistinguishable from R's own. And it has the memory set to zero
as we use R_Calloc, which is documented in Writing R Extension to setting the memory to zero. (We may also use memcpy() explicitly, you can check the sources.)
So in short, you just have yourself confused over what the C API of R, as well as Rcpp offer, and what is easiest to use when. Keep reading documentation, running and writing examples, and studying existing code. It's all out there!

How do I trim a slice and get the indices (indexes) of the result?

This seems like a simple problem:
let slice = " some wacky text. ";
let trimmed = slice.trim();
// how do I get the index of the start and end within the original slice?
Attempt 1
Look for an alternative API. trim wraps trim_matches which deals with indices internally anyway: so lets copy this code! But this uses std::str::pattern::Pattern which is unstable, thus can't be used outside std in stable Rust.
Attempt 2
Just use trim and calculate the slice indices from the pointers. There's a nice as_ptr_range method, but its also unstable; luckily as the PR says there's an easy work-around.
let slice_ptr = slice.as_ptr();
let trimmed_ptr = trimmed.as_ptr();
// don't bother about the end (we can use trimmed.len())
Now that we've got some pointers, we need their difference. sub is not the right method for this. offset_from is, but it's unstable (as noted in the design, it's only valid use is to compare two pointers into the same slice, which is exactly what we want to do, unfortunately it's yet another thing delayed by the details).
Now, there are hackier ways of solving this problem. We could transmute the pointers to usize (we know the element size is 1 byte, so no need to multiply). But this is most likely the Undefined Behaviour type of unsafe, so lets not go there.
Attempt 3
Edit: the source problem is easy to solve directly, so probably the answer in this case is roll-my-own. Possibly I should just close this.

dropping singleton dimensions in julia

Just playing around with Julia (1.0) and one thing that I need to use a lot in Python/numpy/matlab is the squeeze function to drop the singleton dimensions.
I found out that one way to do this in Julia is:
a = rand(3, 3, 1);
a = dropdims(a, dims = tuple(findall(size(a) .== 1)...))
The second line seems a bit cumbersome and not easy to read and parse instantly (this could also be my bias that I bring from other languages). However, I wonder if this is the canonical way to do this in Julia?
The actual answer to this question surprised me. What you are asking could be rephrased as:
why doesn't dropdims(a) remove all singleton dimensions?
I'm going to quote Tim Holy from the relevant issue here:
it's not possible to have squeeze(A) return a type that the compiler
can infer---the sizes of the input matrix are a runtime variable, so
there's no way for the compiler to know how many dimensions the output
will have. So it can't possibly give you the type stability you seek.
Type stability aside, there are also some other surprising implications of what you have written. For example, note that:
julia> f(a) = dropdims(a, dims = tuple(findall(size(a) .== 1)...))
f (generic function with 1 method)
julia> f(rand(1,1,1))
0-dimensional Array{Float64,0}:
0.9939103383167442
In summary, including such a method in Base Julia would encourage users to use it, resulting in potentially type-unstable code that, under some circumstances, will not be fast (something the core developers are strenuously trying to avoid). In languages like Python, rigorous type-stability is not enforced, and so you will find such functions.
Of course, nothing stops you from defining your own method as you have. And I don't think you'll find a significantly simpler way of writing it. For example, the proposition for Base that was not implemented was the method:
function squeeze(A::AbstractArray)
singleton_dims = tuple((d for d in 1:ndims(A) if size(A, d) == 1)...)
return squeeze(A, singleton_dims)
end
Just be aware of the potential implications of using it.
Let me simply add that "uncontrolled" dropdims (drop any singleton dimension) is a frequent source of bugs. For example, suppose you have some loop that asks for a data array A from some external source, and you run R = sum(A, dims=2) on it and then get rid of all singleton dimensions. But then suppose that one time out of 10000, your external source returns A for which size(A, 1) happens to be 1: boom, suddenly you're dropping more dimensions than you intended and perhaps at risk for grossly misinterpreting your data.
If you specify those dimensions manually instead (e.g., dropdims(R, dims=2)) then you are immune from bugs like these.
You can get rid of tuple in favor of a comma ,:
dropdims(a, dims = (findall(size(a) .== 1)...,))
I'm a bit surprised at Colin's revelation; surely something relying on 'reshape' is type stable? (plus, as a bonus, returns a view rather than a copy).
julia> function squeeze( A :: AbstractArray )
keepdims = Tuple(i for i in size(A) if i != 1);
return reshape( A, keepdims );
end;
julia> a = randn(2,1,3,1,4,1,5,1,6,1,7);
julia> size( squeeze(a) )
(2, 3, 4, 5, 6, 7)
No?

What's the idiomatic way to append a slice to a vector?

I have a slice of &[u8] and I'd like to append it to a Vec<u8> with minimal copying. Here are two approaches that I know work:
let s = [0u8, 1u8, 2u8];
let mut v = Vec::new();
v.extend(s.iter().map(|&i| i));
v.extend(s.to_vec().into_iter()); // allocates an extra copy of the slice
Is there a better way to do this in Rust stable? (rustc 1.0.0-beta.2)
There's a method that does exactly this: Vec::extend_from_slice
Example:
let s = [0u8, 1, 2];
let mut v = Vec::new();
v.extend_from_slice(&s);
v.extend(s.iter().cloned());
That is effectively equivalent to using .map(|&i| i) and it does minimal copying.
The problem is that you absolutely cannot avoid copying in this case. You cannot simply move the values because a slice does not own its contents, thus it can only take a copy.
Now, that said, there are two things to consider:
Rust tends to inline rather aggressively; there is enough information in this code for the compiler to just copy the values directly into the destination without any intermediate step.
Closures in Rust aren't like closures in most other languages: they don't require heap allocation and can be directly inlined, thus making them no less efficient than hard-coding the behaviour directly.
Do keep in mind that the above two are dependent on optimisation: they'll generally work out for the best, but aren't guaranteed.
But having said that... what you're actually trying to do here in this specific example is append a stack-allocated array which you do own. I'm not aware of any library code that can actually take advantage of this fact (support for array values is rather weak in Rust at the moment), but theoretically, you could effectively create an into_iter() equivalent using unsafe code... but I don't recommend it, and it's probably not worth the hassle.
I can't speak for the full performance implications, but v + &s will work on beta, which I believe is just similar to pushing each value onto the original Vec.

"Adding" a value to a tuple?

I am attempting to represent dice rolls in Julia. I am generating all the rolls of a ndsides with
sort(collect(product(repeated(1:sides, n)...)), by=sum)
This produces something like:
[(1,1),(2,1),(1,2),(3,1),(2,2),(1,3),(4,1),(3,2),(2,3),(1,4) … (6,3),(5,4),(4,5),(3,6),(6,4),(5,5),(4,6),(6,5),(5,6),(6,6)]
I then want to be able to reasonably modify those tuples to represent things like dropping the lowest value in the roll or adding a constant number, etc., e.g., converting (2,5) into (10,2,5) or (5,).
Does Julia provide nice functions to easily modify (not necessarily in-place) n-tuples or will it be simpler to move to a different structure to represent the rolls?
Thanks.
Tuples are immutable, so you can't modify them in-place. There is very good support for other mutable data structures, so there aren't many methods that take a tuple and return a new, slightly modified copy. One way to do this is by splatting a section of the old tuple into a new tuple, so, for example, to create a new tuple like an existing tuple t but with the first element set to 5, you would write: tuple(5, t[2:end]...). But that's awkward, and there are much better solutions.
As spencerlyon2 suggests in his comment, a one dimensional Array{Int,1} is a great place to start. You can take a look at the Data Structures manual page to get an idea of the kinds of operations you can use; one-dimensional Arrays are iterable, indexable, and support the dequeue interface.
Depending upon how important performance is and how much work you're doing, it may be worthwhile to create your own data structure. You'll be able to add your own, specific methods (e.g., reroll!) for that type. And by taking advantage of some of the domain restrictions (e.g., if you only ever want to have a limited number of dice rolls), you may be able to beat the performance of the general Array.
You can construct a new tuple based on spreading or slicing another:
julia> b = (2,5)
(2, 5)
julia> (10, b...)
(10, 2, 5)
julia> b[2:end]
(5,)

Resources