Is `usize`-indexing in Rust inefficient on some architectures? [duplicate] - pointers

This question already has answers here:
In Rust, is it faster to store larger types or to store smaller ones and cast them all the time?
(2 answers)
Why I can not use u8 as an index value of a Rust array?
(1 answer)
Closed 9 days ago.
On some architectures, the pointer size can be larger than the size of the default data type. Consequently, being forced to use a usize in Rust might result in inefficiencies.
For example, on an 8 bit AVR-architecture we might have a pointer length of 16 bit. With arrays shorter than 256 entries, we'd have to carry out expensive 16 bit operations on the index, although single-cycle 8 bit would be sufficient.
Is this correct? If so, is there a way to enforce a smaller usize in such situations? Or does the compiler take care for this automatically?
Possibly related questions and issues:
What is the correct type to use for an array index?
https://github.com/rust-lang/rfcs/issues/1748

Related

Why do they choose numbers like 16, 32, 128 in programming? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Sometimes in code, I see the developer chooses a number like 32 for a package of data. Or in a game, the loaded terrain of a map has the size of 128*128 points.
I know it has something to do with the maximum size of datatypes. Like a Char has 8 bits, etc.
But why don't they just use numbers like 100*100 for a map, a list, or a Minecraft chunk?
If I have 8 bits to store a (positive) number, I can count to 2^8 = 256.
When I choose the size of a map chunk, I could choose a width of 250 in stead of 256. But it seems that is not a good idea. Why?
Sometimes developers do use numbers like 250 or 100. It's not at all uncommon. (1920 appears in a lot of screen resolutions for example.)
But numbers like 8, 32, and 256 are special because they're powers of 2. For datatypes, like 8-bit integers, the number of possible elements of this type is a power of 2, namely, 2^8 = 256. The sizes of various memory boundaries, disk pages, etc. work nicely with these numbers because they're also powers of two. For example, a 16,384-byte page can hold 2048 8-byte numbers, or 256 64-byte structures, etc. It's easy for a developer to count how many items of a certain size fit in a container of another size if both sizes are powers of two, because they've got many of the numbers memorized.
The previous answer emphasizes that data with these sizes fits well into memory blocks, which is of course true. However it does not really explain why the memory blocks themselves have these sizes:
Memory has to be addressed. This means that the location of a given datum has to be calculated and stored somewhere in memory, often in a CPU register. To save space and calculation cost, these addresses should be as small as possible while still allowing as much memory as possible to be addressed. On a binary computer this leads to powers of 2 as optimal memory or memory block sizes.
There is another related reason: Calculations like multiplication and division by powers of 2 can be implemented by shifting and masking bits. This is much more performant than doing general multiplications or divisions.
An example: Say you have a 16 x 16 array of bytes stored in a contiguous block of memory starting at address 0. To calculate the row and column indices from the address, generally you need to calculate row=address / num_columns and column=address % num_columns (% stands for remainder of integer division).
In this special case it is much easier for a binary computer, e.g.:
address: 01011101
mask last 4 bits: 00001101 => column index
shift right by 4: 00000101 => row index

Bit Array with Find Max

So bit arrays and hash tables don't seem to inherently allow for a find-max type operation, but there are ways around it. I'm wondering if there's a way using the bit array alone without extra variables, pointers, or manipulating the start/end of the array, in some scenarios. For example...
I have integers {1,...,n} and a n-bit bit array. To keep a subset of the integers, I use the integer itself as the key in the bit array and set the bit to 1 if it is in the subset, or 0 if it is not.
For example for integers {1,2,3,4} and subset {1,3), the bit array would look like {1,0,1,0}.
It seems like there's no way to do this without somehow moving the bits around which leads me to believe the O(1) dream is dead and perhaps the bit array won't work. Is something like this possible in O(log n)?
Thanks
Finding the highest set bit on a bit array of length n is O(n). If you need better, then you'll need to choose another data structure, or keep a high-water mark along with your bitmap.

Using 2d array vs array of derived type in Fortran 90

Assuming you want a list of arrays, each having the same size. Is it better performance-wise to use a 2D array :
integer, allocatable :: data(:,:)
or an array of derived types :
type test
integer, allocatable :: content(:)
end type
type(test), allocatable :: data(:)
Of course, for arrays of different sizes, we don't have a choice. But how is the memory managed between the 2 cases ? Also, is one of them good code practice ?
Choose the implementation which minimises the conceptual distance that your mind has to leap between the problem in your head and the solution in your code. The force of this approach increases with age, both the age of your code (good conceptual design is a solid foundation for future development) and your own age (the less effort understanding your code demands the longer you'll remain mentally competent enough to understand it).
As to the non-opinion-determined part of your question concerning the way that the memory is managed ... My naive expectation is that most compilers will, under most circumstances, allocate contiguous memory for the first of your outlines, and may not for the second. But I don't care enough about this to check, and I do not think that you should either. I don't, by this, suggest that you should not be interested in what is going on under the hood, but rather that you should be more concerned with the matters referred to in the first paragraph.
In general, you want to use the simplest data structure that suits your problem. If a 2d rectangular array meets your needs - and for a huge number of scientific computing problems, problems for which Fortran is a good choice, it does - then that's the choice you want.
The 2d array will be contiguous in memory, which will normally make accessing it faster both due to caching and one fewer level of indirection; the 2d array will also allow you to do things like data = data * 2 or data = 0. which the array-of-array approach doesn't [Edited to add: though as IanH points out in comments you can create a defined type and defined operations on those types to allow this]. Those advantages are great enough that even when you have "ragged arrays", if the range of expected row lengths isn't that large, implementing it as a rectangular 2d array is sometimes a choice worth considering.

Fortran array of variable size arrays [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
A simplified description of the problem:
There are exactly maxSize people shopping in a store. Each of them has a shopping list, containing the price of items (as integers). Using Fortran arrays, how can I represent all the shopping lists. The shopping lists may contain any number of items (1, 10, 1000000000).
(NOTE: The actual problem is far more complicated. It is not even about shopping.)
The lazy approach would be:
integer :: array(maxSize, A_REALLY_BIG_NUMBER)
However, this is very wasteful, I basically want the second dimension to be variable, and then allocate it for each person seperately.
The obvious attempt, doomed to failure:
integer, allocatable :: array(:,:)
allocate(array(maxSize, :)) ! Compiler error
Fortran seems to require that arrays have a fixed size in each dimension.
This is wierd, since most languages treat a multidimensional array as an "array of arrays", so you can set the size of each array in the "array of arrays" seperately.
Here is something that does work:
type array1D
integer, allocatable :: elements(:) ! The compiler is fine with this!
endtype array1D
type(array1D) :: array2D(10)
integer :: i
do i=1, size(array2D)
allocate(array2D(i)%elements(sizeAt(i))
enddo
If this is the only solution, I guess I will use it. But I was kind of hoping there would be a way to do this using intrinsic functions. Having to define a custom type for such a simple thing is a bit annoying.
In C, since an array is basically a pointer with fancy syntax, you can do this with an array of pointers:
int sizeAt(int x); //Function that gets the size in the 2nd dimension
int * array[maxSize];
for (int x = 0; x < maxSize; ++x)
array[x] = (int*)(calloc(sizeAt(x) , sizeof(int)));
Fortran seems to have pointers too. But the only tutorials I have found all say "NEVER USE THESE EVER" or something similar.
You seem to be complaining that Fortran isn't C. That's true. There are probably a near infinite number of reasons why the standards committees chose to do things differently, but here are some thoughts:
One of the powerful things about fortran arrays is that they can be sliced.
a(:,:,3) = b(:,:,3)
is a perfectly valid statement. This could not be achieved if arrays were "arrays of pointers to arrays" since the dimensions along each axis would not necessarily be consistent (the very case you're trying to implement).
In C, there really is no such thing as a multidimensional array. You can implement something that looks similar using arrays of pointers to arrays, but that isn't really a multidimensional array since it doesn't share a common block of memory. This can have performance implications. In fact, in HPC (where many Fortran users spend their time), a multi-dimensional C array is often a 1D array wrapped in a macro to calculate the stride based on the size of the dimensions. Also, dereferencing a 7D array like this:
a[i][j][k][l][m][n][o]
is a good bit more difficult to type than:
a(i,j,k,l,m,n,o)
Finally, the solution that you've posted is closest to the C code that you're trying to emulate -- what's wrong with it? Note that for your problem statement, a more complex data-structure (like a linked-list) might be in order (which can be implemented in C or Fortran). Of course, linked-lists are the worst as far as performance goes, but if that's not a concern, it's probably the correct data structure to use as a "shopper" can decide to add more things into their "cart", even if it wasn't on the shopping list they took to the store.

Bitwise operation on floating point numbers (for graphics)? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
how to perform bitwise operation on floating point numbers
Hello, everyone!
Background:
I know that it is possible to apply bitwise operation on graphics (for example XOR). I also know, that in graphic programs, graphic data is often stored in floating point data types (to be able for example to "multiply" the data with 1.05). So it must be possible to perform bitwise operations on floating point data, right?
I need to be able to perform bitwise operations on floating point data. I do not want to cast the data to long, bitwise manipulate it, and cast back to float.
I assume, there exist a mathematical way to achieve this, which is more elegant (?) and/or faster (?).
I've seen some answers but they could not help, including this one.
EDIT:
That other question involves void-pointer casting, which would rely on deeper-level data representation. So it's not such an "exact duplicate".
By the time the "graphics data" hits the screen, none of it is floating point. Bitwise operations are really done on bit strings. Bitwise operations only make sense on numbers because of consistent encoding scheme to binary. Trying to get any kind of logical bitwise operations on floats other than extracting the exponent or mantissa is a road to hell.
Basically, you probably don't want to do this. Why do you think you do?
A floating point number is just another representation of a binary in memory, so you could:
measure the size of the data type (e.g. 32 bits), e.g. sizeof(pixel)
get a pointer to it - choose an integer type of the same size for that, e.g. UINT *ptr = &pixel
use the pointer's value, e.g. newpixel=(*ptr)^(*ptr)
This should at least work with non-negative values and should have no considerable calculative overhead, at least in an unmanaged context like C++. Maybe you have to mask out some bits when doing your operation, and - depending of the type - you may have to treat exponent and base separately.

Resources