Is Rust multi-dimensional array row major and tightly packed? - multidimensional-array

I'm writing a 3D math library for my project, I want to know is the Rust column major or row major? For example I have a 2 dimensional array as matrix and I want to serve it to a C library (like OpenGL or Vulkan), for those library this is important to have a tightly packed column major array.

Well, let's find out:
let arr: [[i8; 2]; 2] = [[1, 2], [8, 9]];
println!(
"{:?} {:?} {:?} {:?}",
&arr[0][0] as *const _,
&arr[0][1] as *const _,
&arr[1][0] as *const _,
&arr[1][1] as *const _,
);
Prints 0x7fff5584ae74 0x7fff5584ae75 0x7fff5584ae76 0x7fff5584ae77 for example. So: yes these arrays with length known to compile time are tightly packed and (considering the common definition of the terms) row major.
Note: the test above doesn't say that this always works! You can read more about this topic here.
But: usually you use heap allocated arrays since you can't know the length beforehand. For that purpose it's idiomatic to use Vec. But there are no special rules for this type, so Vec<Vec<T>> is not tightly packed! For that reason Vec<Vec<T>> is not idiomatic anymore -- you should use a simple Vec<T> and do the calculation of the index yourself.
Of course, writing the indexing calculation multiple times is not a good solution either. Instead, you should define some wrapper type which does the indexing for you. But as Sebastian Redl already mentioned: you are not the only one having this problem and there exist types exactly for this purpose already.

Related

How can I load all entries of a Vec<T> of arbitrary length onto the stack?

I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?
Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.
You can use the unsized_locals feature in nightly Rust:
#![feature(unsized_locals)]
fn example<T>(v: Vec<T>) {
let s: [T] = *v.into_boxed_slice();
dbg!(std::mem::size_of_val(&s));
}
fn main() {
let x = vec![42; 100];
example(x); // Prints 400
}
See also:
Is there a good way to convert a Vec<T> to an array?
How to get a slice as an array in Rust?
I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec
Sure you can.
Vec [...] seems to stride over pointers on the heap, dereferencing at each entry
Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.
for fast access
I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.

How are elements of a vector left-shifted in Rust?

Is there a safe way to left-shift elements of a vector in Rust? (vec![1, 2, 3] becomes vec![3] when left-shifted two places). I'm dealing with Copy types, and I don't want to pay a penalty higher than what I would with a memmove.
The only solution I've found is unsafe: use memmove directly via ptr::copy.
I would use Vec::drain.
You can call it with a range of the elements you want to remove, and it'll shift them over afterwards. Example: (playpen)
fn main() {
let mut v = vec![1, 2, 3];
v.drain(0..2);
assert_eq!(vec![3], v);
}
One other note:
I'm dealing with Copy types, and I don't want to pay a penalty higher than what I would with a memmove.
Worth noting that moving is always a memcpy in Rust, so the Copy vs non-Copy distinction doesn't matter here. It'd be the same if the types weren't Copy.

What's the idiomatic way to append a slice to a vector?

I have a slice of &[u8] and I'd like to append it to a Vec<u8> with minimal copying. Here are two approaches that I know work:
let s = [0u8, 1u8, 2u8];
let mut v = Vec::new();
v.extend(s.iter().map(|&i| i));
v.extend(s.to_vec().into_iter()); // allocates an extra copy of the slice
Is there a better way to do this in Rust stable? (rustc 1.0.0-beta.2)
There's a method that does exactly this: Vec::extend_from_slice
Example:
let s = [0u8, 1, 2];
let mut v = Vec::new();
v.extend_from_slice(&s);
v.extend(s.iter().cloned());
That is effectively equivalent to using .map(|&i| i) and it does minimal copying.
The problem is that you absolutely cannot avoid copying in this case. You cannot simply move the values because a slice does not own its contents, thus it can only take a copy.
Now, that said, there are two things to consider:
Rust tends to inline rather aggressively; there is enough information in this code for the compiler to just copy the values directly into the destination without any intermediate step.
Closures in Rust aren't like closures in most other languages: they don't require heap allocation and can be directly inlined, thus making them no less efficient than hard-coding the behaviour directly.
Do keep in mind that the above two are dependent on optimisation: they'll generally work out for the best, but aren't guaranteed.
But having said that... what you're actually trying to do here in this specific example is append a stack-allocated array which you do own. I'm not aware of any library code that can actually take advantage of this fact (support for array values is rather weak in Rust at the moment), but theoretically, you could effectively create an into_iter() equivalent using unsafe code... but I don't recommend it, and it's probably not worth the hassle.
I can't speak for the full performance implications, but v + &s will work on beta, which I believe is just similar to pushing each value onto the original Vec.

Explaining C declarations in Rust

I need to rewrite these C declarations in Go and Rust for a set of practice problems I am working on. I figured out the Go part, but I am having trouble with the Rust part. Any ideas or help to write these in Rust?
double *a[n];
double (*b)[n];
double (*c[n])();
double (*d())[n];
Assuming n is a constant:
let a: [*mut f64, ..n]; // double *a[n];
let b: *mut [f64, ..n]; // double (*b)[n];
let c: [fn() -> f64, ..n]; // double (*c[n])();
fn d() -> *mut [f64, ..n]; // double (*d())[n];
These are rather awkward and unusual types in any language. Rust's syntax, however, makes these declarations a lot easier to read than C's syntax does.
Note that d in C is a function declaration. In Rust, external function declarations are only allowed in extern blocks (see the FFI guide).
The answer depends on what, exactly, the * is for. For example, is the first one being used as an array of pointers to doubles, or is it an array of arrays of doubles? Are the pointers nullable or not?
Also, is n a constant or not? If it is, then you want an array; if it's not, you want a Vec.
Also also, are these global or local declarations? Are they function arguments? There's different syntax involved for each.
Frankly, without more context, it's impossible to answer this question with any accuracy. Instead, I will give you the following:
The Rust documentation contains all the information you'll need, although it's spread out a bit. Check the reference and any appropriate-looking guides. The FFI Guide is probably worth looking at.
cdecl is a website that will unpick C declarations if that's the part you're having difficulty with. Just note that you'll have to remove the semicolon and the n or it won't parse.
The floating point types in Rust are f32 and f64, depending on whether you're using float or double. Also, don't get caught: int in Rust is not equivalent to int in C. Prefer explicitly-sized types like i32 or u64, or types from libc like c_int. int and uint should only be used with explicitly pointer-sized values.
Normally, you'd write a reference to a T as &T or &mut T, depending on desired mutability (default in C is mutable, default in Rust is immutable).
If you want a nullable reference, use Option<&T>.
If you are trying to use these in a context where you start getting complaints about needing "lifetimes"... well, you're just going to have to learn the language. At that point, simple translation isn't going to work very well.
In Rust, array types are written as brackets around the element type. So an "array of doubles" would be [f64], an array of size n would be [f64, ..n]. Typically, however, the actual equivalent to, say, double[] in C would be &[f64]; that is, a reference to an array, rather then the actual contents of the array.
Use of "raw pointers" is heavily discouraged in Rust, and you cannot use them meaningfully outside of unsafe code. In terms of syntax, a pointer to T is *const T or *mut T, depending on whether it's a pointer to constant or mutable data.
Function pointers are just written as fn (Args...) -> Result. So a function that takes nothing and returns a double would be fn () -> f64.

Haskell "collections" language design

Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.
Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.
I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].
A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.
With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).
I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.

Resources