Arithmetic operation in Assembly - math

I am learning assembly language. I find that arithmetic in assembly can be either signed or unsigned. Rules are different for both type of arithmetic and I find it is programmer's headache to decide which rules to apply. So a programmer should know beforehand if arithmetic involves the negative numbers or not. if yes, signed arithmetic rules should be used, else simpler and easier unsigned arithmetic will do.
Main problem I find with unsigned arithmetic is ‘what if result is larger than its storage area?’. It can be easily solved by using a bigger-than-required storage area for the data. But that will consume extra bytes and size of data segment would increase. If size of the code is no issue, can't we use this technique freely?

If you are the programmer, you are in control of your data representation within the bounds of the requirements of your software's target domain. This means you need to know well before you actually start touching code what type of data you are going to be dealing with, how it is going to be arranged (in the case of complex data types) and how it is going to be encoded (floating-point/unsigned integer/signed integer, etc.). It is "safest" to use the operations that match the type of the data you're manipulating which, if you've done your design right, you should already know.

It's not that simple. Most arithmetic operations are sign agnostic: they are neither signed nor unsigned.
The interpretation of the result—which is determined by program specification—is what makes them signed or unsigned, not the operation itself. The proper flavor of compare instructions always have to be chosen carefully.
In some CPU architectures there are distinct signed and unsigned divide instructions, but that is about as far as it goes. Most CPUs have arithmetic shift right instruction flavors which either preserve the high bit or replace it with zero: that can be used as signed and unsigned handling, respectively.

Related

What Are The Reasons For Bit Shifting A Float Before Sending It Via A Network

I work with Unity and C# - when making multiplayer games I've been told that when it comes to values like positions that are floats, I should use a bit shift operator on them before sending them and reverse the operation on receive. I have been told this not only allows for larger numbers values and is capable of maintaining floating point precision which may be lost. However, if I do not have to, I do not wish to run this operation every time I receive a packet unless I have to. Though the bottle necks seem to be the actual parsing of the bytes received. Especially without message framing and attempting to move from string to byte array. (But that's another story!)
My question are:
Are these valid reason to undergo the operation? Are they accurate statements?
If not should I be running bit shift ops on my floats?
If I should, what are the real reasons to do it?
Any additional information would be most appreciated.
One of the resourcesI'm referring to:
Main reasons for going back and forth to/from network byte order is to combat endianness caused problems, mainly to ensure each byte of multi byte values (long, int but also floats) is read and written in the way giving the same results regardless of architecture. This issue can be theoretically ignored if you are sure you are exchanging data between systems using the same endianness, but that's rather bad idea from very beginning as you are simply creating unneded technological debt and keep unjustified exceptions in the code ("It all works BUT on the same endianness only. What can go wrong?").
Depending on your app architecture you can rewrite the packet payload/data once you receive it and then use that version further in the code. Also note that you need to encode the data again prior sending it out.

Why both? vperm2f128 (avx) vs vperm2i128 (avx2)

avx introduced the instruction vperm2f128 (exposed via _mm256_permute2f128_si256), while avx2 introduced vperm2i128 (exposed via _mm256_permute2x128_si256).
They both seem to be doing exactly the same, and their respective latencies and throughputs also seem to be identical.
So why do both instructions exist? There has to be some reasoning behind that? Is there maybe something I have overlooked? Given that avx2 operates on data structures introduced with avx, I cannot imagine that a processor will ever exist that supports avx2 but not avx.
There's a bit of a disconnect between the intrinsics and the actual instructions that are underneath.
AVX:
All 3 of these generate exactly the same instruction, vperm2f128:
_mm256_permute2f128_pd()
_mm256_permute2f128_ps()
_mm256_permute2f128_si256()
The only difference are the types - which don't exist at the instruction level.
vperm2f128 is a 256-bit floating-point instruction. In AVX, there are no "real" 256-bit integer SIMD instructions. So even though _mm256_permute2f128_si256() is an "integer" intrinsic, it's really just syntax sugar for this:
_mm256_castpd_si256(
_mm256_permute2f128_pd(
_mm256_castsi256_pd(x),
_mm256_castsi256_pd(y),
imm
)
);
Which does a round trip from the integer domain to the FP domain - thus incurring bypass delays. As ugly as this looks, it is only way to do it in AVX-only land.
vperm2f128 isn't the only instruction to get this treatment, I find at least 3 of them:
vperm2f128 / _mm256_permute2f128_si256()
vextractf128 / _mm256_extractf128_si256()
vinsertf128 / _mm256_insertf128_si256()
Together, it seems that the usecase of these intrinsics is to load data as 256-bit integer vectors, and shuffle them into multiple 128-bit integer vectors for integer computation. Likewise the reverse where you store as 256-bit vectors.
Without these "hack" intrinsics, you would need to use a lot of cast intrinsics.
Either way, a competent compiler will try to optimize the types as well. Thus it will generate floating-point load/stores and shuffles even if you are using 256-bit integer loads. This reduces the number of bypass delays to only one layer. (when you go from FP-shuffle to 128-bit integer computation)
AVX2:
AVX2 cleans up this madness by adding proper 256-bit integer SIMD support for everything - including the shuffles.
The vperm2i128 instruction is new along with a new intrinsic for it, _mm256_permute2x128_si256().
This, along with _mm256_extracti128_si256() and _mm256_inserti128_si256() lets you do 256-bit integer SIMD and actually stay completely in the integer domain.
The distinction between integer FP versions of the same instructions has to do with bypass delays. In older processors, there were delays to move data from int <-> FP domains. While the SIMD registers themselves are type-agnostic, the hardware implementation isn't. And there is extra latency to get data output by an FP instruction to an input to an integer instruction. (and vice versa)
Thus it was important (from a performance standpoint) to use the correct instruction type to match the actual datatype that was being operated on.
On the newest processors (Skylake and later?), there doesn't seem to be anymore int/FP bypass delays with regards to the shuffle instructions. While the instruction set still has this distinction, shuffle instructions that do the same thing with different "types" probably map to the same uop now.

Using hash as a bucket index, modulo vs bit-mask

I've been looking into hash tables, where some data is hashed and that is used for a bucket index.
Some libraries use the modulo of the hash with the bucket size, and others use a bit-mask.
Where only the bits used by the bucket mask are used (ensuring the range is not exceeded).
bitmask:
index = h->hash_func(key) & h->hash_mask;
modulo:
index = h->hash_func(key) % h->bucket_tot;
While there are obvious differences between the two, such as bucket size constraints with bit-masks, ensuring hashing gives good distribution on lower bits, speed of modulo... etc.
Are there strong reasons to choose one over another?
(I'll probably try & benchmark for my own use-case, but curious whats already known on the matter).
Note, this is simply for key:value store, (dictionary/hash/associative-array) and not security related.
Example of a dynamic resizing, chaining hash table implementation using bit-mask:
https://github.com/amadvance/tommyds/blob/master/tommyds/tommyhashdyn.c
https://github.com/GNOME/glib/blob/master/glib/ghash.c
Example using modulo:
https://www.daniweb.com/software-development/c/threads/104887/sucinct-example-of-hash-table-w-chaining
You mentioned "bucket" index so I assume you mean hash tables with separate chaining as collision resolution, in this case there is no reasons for using modulo or bit mask "stronger" that you mentioned (which BTW not so obvious, as you said).
In some languages, most notably Java/JVM-based, array index is positive signed 32-bit integer, thus maximum array size for bit mask is 2^30, that could be insufficient and a strong reason to use no-power-of-two table size and modulo, with which you can approach 2^31-1(max possible signed 32-bit integer) very closely. But since you used C++ syntax this shouldn't be a concern for you.
Also, if you meant not only separate chaining, some open addressing collision resolution algorithms require table size to meet certain conditions, for example, if you implement double hashing, table size should be prime. In this case you obviously should use only modulo to obtain the initial index in the table.
It isn't always just about performance either, sometimes it's about the domain of your problem. You may, for example, have a mask that wishes to hash negative numbers as well. With modulo you have to write special cases to handle them, not so with a bitmask.

BigInt and indirection: is it feasible to implement BigInt with pointer-arithmetic on the pointer, for small values?

I assume an implementation in a language that allows you to treat pointers as integers, including doing the standard arithmetic on them. If this is unrealistic due to hardware constraints, please let me know. If programming languages normally don't have such powerful pointer arithmetic, but it is feasible in practice, then I still want to know if this approach to implementing BigInt is feasible.
I don't really have any experience with low-level - as in, programming with pointers - programming, so many of my assumptions in the following might be wrong.
Question: As far as I know, implementing BigInt - arbitrary precision/size integer - might be done by a dynamic array of integers, which grows as needed. Then the data structure might be represented as a pointer to the array of integers. But, assuming that the pointer is an integer just like the ones in the array, and that one can do pointer arithmetic on that pointer, then is it feasible to represent the value of the BigInt by simply using that pointer? Then one can avoid the indirection for small values of integers.
Since the pointer could either be a real pointer to a memory address to an array of integers, you would have to have some way of knowing that you should treat it as a pointer or as an integer value. Especially since its role through the life cycle of the BigInt. Suppose you do this by setting the most significant bit to 1 if the pointer is really a pointer, 0 otherwise. As far as it being an integer is concerned, this seems simple enough: Check if is set to 1 before doing anything with it. If it is not, do whatever arithmetic on that in particular and see if it has overflown, and do the appropriate thing if it did.
But this has a problem: does the pointer use its full range to point to memory addresses? If so, then there doesn't seem to be any way of using any bit-pattern to distinguish integers from pointers: every bit-pattern is a potential memory address. It seems reasonable that pointers are represented as signed integers, though to my mind they might also be represented as unsigned integers if that makes the implementation simpler.
So, if pointers are signed; then you don't seem to be able to use pointers are integers, for this purpose. If so, is it feasible (that is: efficient compared to the alternatives) to represent the BigInt as a struct (or record, if you want) with two members; the pointer to an array and an integer that is used when the value of the BigInt is small? If the pointer to the array is null, then the integer is used. If not, use the pointer to the array and ignore the int in the struct. This makes for a more "bloated" data structure, but it might help when it comes to avoiding indirection sometimes, assuming that you don't need a pointer to this struct and can pass it around as a value.
Another question: Is this done in practice?
On 32-bit machines, the lower two bits on pointers are almost always 0 because addresses are 32-bit aligned. Similarly, on 64-bit machines, the lower three bits will be 0.
You can use this fact to use the least-significant bit of the pointer to tag whether it's a number or not. One simple option would be to set the LSB to 1 if it's a number and to 0 if it's a pointer. When performing arithmetic computations, you first check the LSB to see whether you have a pointer or an integer. If it's 1, you arithmetically right-shift the number over one bit to get the real integer value, then use that value in the computation. If it's 0, you simply follow the pointer to the representation.
You could conceivably use the fact that you have 2 or 3 bits of space to encode more possible representations. For example, you could have the number be either an integer, a pointer to a fixed-sized buffer, or a pointer to a variable-sized buffer, using the free pointer bits to encode which case you happen to be in.
Hope this helps!

What the best ways to use decimals and datetimes with protocol buffers?

I would like to find out what is the optimum way of storing some common data type that were not included in the list supported by protocol buffers.
datetime (seconds precision)
datetime (milliseconds precision)
decimals with fixed precision
decimals with variable precision
lots of bool values (if you have lots of them it looks like you'll have 1-2 bytes overhead for each of them due to their tags.
Also the idea is to map them very easy to corresponding C++/Python/Java data types.
The protobuf design rationale is most likely to keep data type support as "native" as possible, so that it's easy to adopt new languages in future. I suppose they could provide in-build message types, but where do you draw the line?
My solution was to create two message types:
DateTime
TimeSpan
This is only because I come from a C# background, where these types are taken for granted.
In retrospect, TimeSpan and DateTime may have been overkill, but it was a "cheap" way of avoiding conversion from h/m/s to s and vice versa; that said, it would have been simple to just implement a utility function such as:
int TimeUtility::ToSeconds(int h, int m, int s)
Bklyn, pointed out that heap memory is used for nested messages; in some cases this is clearly very valid - we should always be aware of how memory is used. But, in other cases this can be of less concern, where we're worried more about ease of implementation (this is the Java/C# philosophy I suppose).
There's also a small disadvantage to using non-intrinsic types with the protobuf TextFormat::Printer; you cannot specify the format in which it is displayed, so it'll look something like:
my_datetime {
seconds: 10
minutes: 25
hours: 12
}
... which is too verbose for some. That said, it would be harder to read if it were represented in seconds.
To conclude, I'd say:
If you're worried about memory/parsing efficiency, use seconds/milliseconds.
However, if ease of implementation is the objective, use nested messages (DateTime, etc).
Here are some ideas based on my experience with a wire protocol similar to Protocol Buffers.
datetime (seconds precision)
datetime (milliseconds precision)
I think the answer to these two would be the same, you would just typically be dealing with a smaller range of numbers in the case of seconds precision.
Use a sint64/sfixed64 to store the offset in seconds/milliseconds from some well-known epoch like midnight GMT 1/1/1970. This how Date objects are internally represented in Java. I'm sure there are analogs in Python and C++.
If you need time zone information, pass around your date/times in terms of UTC and model the pertinent time zone as a separate string field. For that, you can use the identifiers from the Olson Zoneinfo database since that has become somewhat standard.
This way you have a canonical representation for date/time, but you can also localize to whatever time zone is pertinent.
decimals with fixed precision
My first thought is to use a string similar to how one constructs Decimal objects from Python's decimal package. I suppose that could be inefficient relative to some numerical representation.
There may be better solutions depending on what domain you're working with. For example, if you're modeling a monetary value, maybe you can get away with using a uint32/64 to communicate the value in cents as opposed to fractional dollar amounts.
There are also some useful suggestions in this thread.
decimals with variable precision
Doesn't Protocol Buffers already support this with float/double scalar types? Maybe I've misunderstood this bullet point.
Anyway, if you had a need to go around those scalar types, you can encode using IEEE-754 to uint32 or uint64 (float vs double respectively). For example, Java allows you to extract the IEEE-754 representation and vice versa from Float/Double objects. There are analogous mechanisms in C++/Python.
lots of bool values (if you have lots
of them it looks like you'll have 1-2
bytes overhead for each of them due to
their tags.
If you are concerned about wasted bytes on the wire, you could use bit-masking techniques to compress many booleans into a single uint32 or uint64.
Because there isn't first class support in Protocol Buffers, all of these techniques require a bit of a gentlemens' contract between agents. Perhaps using a naming convention on your fields like "_dttm" or "_mask" would help communicate when a given field has additional encoding semantics above and beyond the default behavior of Protocol Buffers.
Sorry, not a complete answer, but a "me too".
I think this is a great question, one I'd love an answer to myself. The inability to natively describe fundamental types like datetimes and (for financial applications) fixed point decimals, or map them to language-specified or user-defined types is a real killer for me. Its more or less prevented me from being able to use the library, which I otherwise think is fantastic.
Declaring your own "DateTime" or "FixedPoint" message in the proto grammar isn't really a solution, because you'll still need to convert your platform's representation to/from the generated objects manually, which is error prone. Additionally, these nested messages get stored as pointers to heap-allocated objects in C++, which is wildly inefficient when the underlying type is basically just a 64-bit integer.
Specifically, I'd want to be able to write something like this in my proto files:
message Something {
required fixed64 time = 1 [cpp_type="boost::posix_time::ptime"];
required int64 price = 2 [cpp_type="fixed_point<int64_t, 4>"];
...
};
And I would be required to provide whatever glue was necessary to convert these types to/from fixed64 and int64 so that the serialization would work. Maybe thru something like adobe::promote?
For datetime with millisecond resolution I used an int64 that has the datetime as YYYYMMDDHHMMSSmmm. This makes it both concise and readable, and surprisingly, will last a very long time.
For decimals, I used byte[], knowing that there's no better representation that won't be lossy.

Resources