BigInt and indirection: is it feasible to implement BigInt with pointer-arithmetic on the pointer, for small values? - pointers

I assume an implementation in a language that allows you to treat pointers as integers, including doing the standard arithmetic on them. If this is unrealistic due to hardware constraints, please let me know. If programming languages normally don't have such powerful pointer arithmetic, but it is feasible in practice, then I still want to know if this approach to implementing BigInt is feasible.
I don't really have any experience with low-level - as in, programming with pointers - programming, so many of my assumptions in the following might be wrong.
Question: As far as I know, implementing BigInt - arbitrary precision/size integer - might be done by a dynamic array of integers, which grows as needed. Then the data structure might be represented as a pointer to the array of integers. But, assuming that the pointer is an integer just like the ones in the array, and that one can do pointer arithmetic on that pointer, then is it feasible to represent the value of the BigInt by simply using that pointer? Then one can avoid the indirection for small values of integers.
Since the pointer could either be a real pointer to a memory address to an array of integers, you would have to have some way of knowing that you should treat it as a pointer or as an integer value. Especially since its role through the life cycle of the BigInt. Suppose you do this by setting the most significant bit to 1 if the pointer is really a pointer, 0 otherwise. As far as it being an integer is concerned, this seems simple enough: Check if is set to 1 before doing anything with it. If it is not, do whatever arithmetic on that in particular and see if it has overflown, and do the appropriate thing if it did.
But this has a problem: does the pointer use its full range to point to memory addresses? If so, then there doesn't seem to be any way of using any bit-pattern to distinguish integers from pointers: every bit-pattern is a potential memory address. It seems reasonable that pointers are represented as signed integers, though to my mind they might also be represented as unsigned integers if that makes the implementation simpler.
So, if pointers are signed; then you don't seem to be able to use pointers are integers, for this purpose. If so, is it feasible (that is: efficient compared to the alternatives) to represent the BigInt as a struct (or record, if you want) with two members; the pointer to an array and an integer that is used when the value of the BigInt is small? If the pointer to the array is null, then the integer is used. If not, use the pointer to the array and ignore the int in the struct. This makes for a more "bloated" data structure, but it might help when it comes to avoiding indirection sometimes, assuming that you don't need a pointer to this struct and can pass it around as a value.
Another question: Is this done in practice?

On 32-bit machines, the lower two bits on pointers are almost always 0 because addresses are 32-bit aligned. Similarly, on 64-bit machines, the lower three bits will be 0.
You can use this fact to use the least-significant bit of the pointer to tag whether it's a number or not. One simple option would be to set the LSB to 1 if it's a number and to 0 if it's a pointer. When performing arithmetic computations, you first check the LSB to see whether you have a pointer or an integer. If it's 1, you arithmetically right-shift the number over one bit to get the real integer value, then use that value in the computation. If it's 0, you simply follow the pointer to the representation.
You could conceivably use the fact that you have 2 or 3 bits of space to encode more possible representations. For example, you could have the number be either an integer, a pointer to a fixed-sized buffer, or a pointer to a variable-sized buffer, using the free pointer bits to encode which case you happen to be in.
Hope this helps!

Related

What does crossbeam_epoch::Shared::as_raw mean by "Converts the pointer to a raw pointer (without the tag)"?

Can someone translate this into something that makes sense for me:
Converts the pointer to a raw pointer (without the tag).
What is the difference between a pointer and a raw pointer?
The Stack Overflow raw-pointer tag says neither "smart" nor "shared" which again is mystifying.
What are Crossbeam's Shared::as_raw's "tags" all about?
crossbeam_epoch::Shared is a smart pointer. That is, a pointer plus extra stuff. In C++ or Rust, smart pointer is the term used for a pointer wrapper which adds any of the following:
Ownership information
Lifetime information
Packing extra data in unused bits
Copy-on-write behavior
Reference counting
In that context, a raw pointer is just the wrapped pointer, without all the extra stuff.
crossbeam_epoch::Shared fits (among others) in the “Packing extra data in unused bits” category above. Most data in modern computers is aligned, that is, addresses are a multiple of some power of two. This means that all low bits of the addresses are always 0. One can use that fact to store a few extra bits of information in a pointer.
This extra data is called tag by this particular library, however that term isn't as common as raw pointer.

Why both? vperm2f128 (avx) vs vperm2i128 (avx2)

avx introduced the instruction vperm2f128 (exposed via _mm256_permute2f128_si256), while avx2 introduced vperm2i128 (exposed via _mm256_permute2x128_si256).
They both seem to be doing exactly the same, and their respective latencies and throughputs also seem to be identical.
So why do both instructions exist? There has to be some reasoning behind that? Is there maybe something I have overlooked? Given that avx2 operates on data structures introduced with avx, I cannot imagine that a processor will ever exist that supports avx2 but not avx.
There's a bit of a disconnect between the intrinsics and the actual instructions that are underneath.
AVX:
All 3 of these generate exactly the same instruction, vperm2f128:
_mm256_permute2f128_pd()
_mm256_permute2f128_ps()
_mm256_permute2f128_si256()
The only difference are the types - which don't exist at the instruction level.
vperm2f128 is a 256-bit floating-point instruction. In AVX, there are no "real" 256-bit integer SIMD instructions. So even though _mm256_permute2f128_si256() is an "integer" intrinsic, it's really just syntax sugar for this:
_mm256_castpd_si256(
_mm256_permute2f128_pd(
_mm256_castsi256_pd(x),
_mm256_castsi256_pd(y),
imm
)
);
Which does a round trip from the integer domain to the FP domain - thus incurring bypass delays. As ugly as this looks, it is only way to do it in AVX-only land.
vperm2f128 isn't the only instruction to get this treatment, I find at least 3 of them:
vperm2f128 / _mm256_permute2f128_si256()
vextractf128 / _mm256_extractf128_si256()
vinsertf128 / _mm256_insertf128_si256()
Together, it seems that the usecase of these intrinsics is to load data as 256-bit integer vectors, and shuffle them into multiple 128-bit integer vectors for integer computation. Likewise the reverse where you store as 256-bit vectors.
Without these "hack" intrinsics, you would need to use a lot of cast intrinsics.
Either way, a competent compiler will try to optimize the types as well. Thus it will generate floating-point load/stores and shuffles even if you are using 256-bit integer loads. This reduces the number of bypass delays to only one layer. (when you go from FP-shuffle to 128-bit integer computation)
AVX2:
AVX2 cleans up this madness by adding proper 256-bit integer SIMD support for everything - including the shuffles.
The vperm2i128 instruction is new along with a new intrinsic for it, _mm256_permute2x128_si256().
This, along with _mm256_extracti128_si256() and _mm256_inserti128_si256() lets you do 256-bit integer SIMD and actually stay completely in the integer domain.
The distinction between integer FP versions of the same instructions has to do with bypass delays. In older processors, there were delays to move data from int <-> FP domains. While the SIMD registers themselves are type-agnostic, the hardware implementation isn't. And there is extra latency to get data output by an FP instruction to an input to an integer instruction. (and vice versa)
Thus it was important (from a performance standpoint) to use the correct instruction type to match the actual datatype that was being operated on.
On the newest processors (Skylake and later?), there doesn't seem to be anymore int/FP bypass delays with regards to the shuffle instructions. While the instruction set still has this distinction, shuffle instructions that do the same thing with different "types" probably map to the same uop now.

How can I create an owning pointer to an unsized type?

Dealing with values of type str in Rust is clumsy because they do not implement the trait Sized. Therefore, they can only be accessed by pointer.
For my application, using ordinary pointers with lifetimes is not very helpful. Rather, I want an owning fat pointer that guarantees that the contained object will last as long as the pointer does (and no longer), but allows holding values of unknown size.
Box<T> works for an unsized T; thus Box<str>, Box<[T]> and so forth. The important distinction to note between Box<str> and String is that the latter has a capacity member as well, increasing its memory usage by one word but allowing for efficient appending as it may not need to reallocate for every push, whereas a similar method on a Box<str> would need to. The same is true of Box<[T]> versus Vec<T>, with the former being a fixed-size slice while the latter is conveniently growable. Unlike Box<str>, Box<[T]> is actually used in real life; the vec! macro uses it for efficiency, as a Box<[T]> can be written out literally and then converted to a Vec<T> at no cost.

Arithmetic operation in Assembly

I am learning assembly language. I find that arithmetic in assembly can be either signed or unsigned. Rules are different for both type of arithmetic and I find it is programmer's headache to decide which rules to apply. So a programmer should know beforehand if arithmetic involves the negative numbers or not. if yes, signed arithmetic rules should be used, else simpler and easier unsigned arithmetic will do.
Main problem I find with unsigned arithmetic is ‘what if result is larger than its storage area?’. It can be easily solved by using a bigger-than-required storage area for the data. But that will consume extra bytes and size of data segment would increase. If size of the code is no issue, can't we use this technique freely?
If you are the programmer, you are in control of your data representation within the bounds of the requirements of your software's target domain. This means you need to know well before you actually start touching code what type of data you are going to be dealing with, how it is going to be arranged (in the case of complex data types) and how it is going to be encoded (floating-point/unsigned integer/signed integer, etc.). It is "safest" to use the operations that match the type of the data you're manipulating which, if you've done your design right, you should already know.
It's not that simple. Most arithmetic operations are sign agnostic: they are neither signed nor unsigned.
The interpretation of the result—which is determined by program specification—is what makes them signed or unsigned, not the operation itself. The proper flavor of compare instructions always have to be chosen carefully.
In some CPU architectures there are distinct signed and unsigned divide instructions, but that is about as far as it goes. Most CPUs have arithmetic shift right instruction flavors which either preserve the high bit or replace it with zero: that can be used as signed and unsigned handling, respectively.

What the best ways to use decimals and datetimes with protocol buffers?

I would like to find out what is the optimum way of storing some common data type that were not included in the list supported by protocol buffers.
datetime (seconds precision)
datetime (milliseconds precision)
decimals with fixed precision
decimals with variable precision
lots of bool values (if you have lots of them it looks like you'll have 1-2 bytes overhead for each of them due to their tags.
Also the idea is to map them very easy to corresponding C++/Python/Java data types.
The protobuf design rationale is most likely to keep data type support as "native" as possible, so that it's easy to adopt new languages in future. I suppose they could provide in-build message types, but where do you draw the line?
My solution was to create two message types:
DateTime
TimeSpan
This is only because I come from a C# background, where these types are taken for granted.
In retrospect, TimeSpan and DateTime may have been overkill, but it was a "cheap" way of avoiding conversion from h/m/s to s and vice versa; that said, it would have been simple to just implement a utility function such as:
int TimeUtility::ToSeconds(int h, int m, int s)
Bklyn, pointed out that heap memory is used for nested messages; in some cases this is clearly very valid - we should always be aware of how memory is used. But, in other cases this can be of less concern, where we're worried more about ease of implementation (this is the Java/C# philosophy I suppose).
There's also a small disadvantage to using non-intrinsic types with the protobuf TextFormat::Printer; you cannot specify the format in which it is displayed, so it'll look something like:
my_datetime {
seconds: 10
minutes: 25
hours: 12
}
... which is too verbose for some. That said, it would be harder to read if it were represented in seconds.
To conclude, I'd say:
If you're worried about memory/parsing efficiency, use seconds/milliseconds.
However, if ease of implementation is the objective, use nested messages (DateTime, etc).
Here are some ideas based on my experience with a wire protocol similar to Protocol Buffers.
datetime (seconds precision)
datetime (milliseconds precision)
I think the answer to these two would be the same, you would just typically be dealing with a smaller range of numbers in the case of seconds precision.
Use a sint64/sfixed64 to store the offset in seconds/milliseconds from some well-known epoch like midnight GMT 1/1/1970. This how Date objects are internally represented in Java. I'm sure there are analogs in Python and C++.
If you need time zone information, pass around your date/times in terms of UTC and model the pertinent time zone as a separate string field. For that, you can use the identifiers from the Olson Zoneinfo database since that has become somewhat standard.
This way you have a canonical representation for date/time, but you can also localize to whatever time zone is pertinent.
decimals with fixed precision
My first thought is to use a string similar to how one constructs Decimal objects from Python's decimal package. I suppose that could be inefficient relative to some numerical representation.
There may be better solutions depending on what domain you're working with. For example, if you're modeling a monetary value, maybe you can get away with using a uint32/64 to communicate the value in cents as opposed to fractional dollar amounts.
There are also some useful suggestions in this thread.
decimals with variable precision
Doesn't Protocol Buffers already support this with float/double scalar types? Maybe I've misunderstood this bullet point.
Anyway, if you had a need to go around those scalar types, you can encode using IEEE-754 to uint32 or uint64 (float vs double respectively). For example, Java allows you to extract the IEEE-754 representation and vice versa from Float/Double objects. There are analogous mechanisms in C++/Python.
lots of bool values (if you have lots
of them it looks like you'll have 1-2
bytes overhead for each of them due to
their tags.
If you are concerned about wasted bytes on the wire, you could use bit-masking techniques to compress many booleans into a single uint32 or uint64.
Because there isn't first class support in Protocol Buffers, all of these techniques require a bit of a gentlemens' contract between agents. Perhaps using a naming convention on your fields like "_dttm" or "_mask" would help communicate when a given field has additional encoding semantics above and beyond the default behavior of Protocol Buffers.
Sorry, not a complete answer, but a "me too".
I think this is a great question, one I'd love an answer to myself. The inability to natively describe fundamental types like datetimes and (for financial applications) fixed point decimals, or map them to language-specified or user-defined types is a real killer for me. Its more or less prevented me from being able to use the library, which I otherwise think is fantastic.
Declaring your own "DateTime" or "FixedPoint" message in the proto grammar isn't really a solution, because you'll still need to convert your platform's representation to/from the generated objects manually, which is error prone. Additionally, these nested messages get stored as pointers to heap-allocated objects in C++, which is wildly inefficient when the underlying type is basically just a 64-bit integer.
Specifically, I'd want to be able to write something like this in my proto files:
message Something {
required fixed64 time = 1 [cpp_type="boost::posix_time::ptime"];
required int64 price = 2 [cpp_type="fixed_point<int64_t, 4>"];
...
};
And I would be required to provide whatever glue was necessary to convert these types to/from fixed64 and int64 so that the serialization would work. Maybe thru something like adobe::promote?
For datetime with millisecond resolution I used an int64 that has the datetime as YYYYMMDDHHMMSSmmm. This makes it both concise and readable, and surprisingly, will last a very long time.
For decimals, I used byte[], knowing that there's no better representation that won't be lossy.

Resources