BigInt for Standard ML/NJ - functional-programming

Is there a Java BigInt equivalent for Standard ML? The normal int type throws an exception when it overflows.

Yes, see the IntInf structure.

The official SML'97 standard basis library introduces a zoo of structures like Int, IntInf, Int32, Int64, LargeInt etc.
To actually use them in practice to make things work as expected, and make them work efficiently, you need to look closely at the SML implementation at hand.
One family of implementations imitates the memory layout of C and Java, so Int32 will be really a 32bit machine word (but with overflow checking), and Int64 a 64bit machine word. SML/NJ is a notable example for that, and its small int arithmentic is fast, but its big int arithmentic slow.
Another family of implementations come from the background of symbolic computation (LISP or Computer Algebra), where Poly/ML is a notable example. Here you have Int = IntInf = LargeInt by default, and the implementation first uses (part of) the native machine word as approximation, until it overflows and then switches to really big integers that are allocated on the heap (as boxed values). Poly/ML uses the GNU MP library for that big part.
Thus Int/IntInf is very efficient as long as your application is about integers, not machine words of a specific size: Int32 in the symbolic model won't fit into a single word on 32bit hardware due to the extra tag bits that are required. So some algorithms that are actually about word arithmetic will degrade, for example SHA1 on 32bit hardware.
On the other hand, the implicit upgrade of shorter-than-wordsize int to heap-allocated big int gives you something better than BigInt in Java, because you won't need the full object overhead for small values: 42 will be just some bit pattern in a register (with additional tag bit), but not a heavy box on the heap.

The BigInt-equivalent is called LargeInt. See these lecture notes to see some functions on how to convert between int (aka Int) and LargeInt.

While this isn't exactly what you were asking, you don't actually want an equivalent to the Java BigInt class. Java's BigInt class implements O(n^2) time for multiplication (essentially multiplying the way it's taught in elementary school), instead of O(n log n), which is possible. This is really important, as a lot of trivial BigInt programming simply doesn't work with the n^2 version.

Well, int puts a nasty limit on stuff like calculating permutations. SML needs a large numeric datatype thats more natural to use.

Related

Why does V8 uses pointer tagging and not NaN boxing?

I'm learning V8 internals now. I learned that V8 uses pointer tagging for value storing, but wondered why it is not use NaN boxing.
AFAIK, NaN boxing is better because it can also store doubles and not just SMIs. I've read this, and understand (if that true) why not use NaN boxing on 32-bit platforms. But on 64-bit platforms I don't see why.
I suspect the reason has something to do with SMIs. Maybe they can't be stored using NaN boxing? I think they can. We have 52 superfluous bits for them (we can even use more than 32 bits). Maybe this will require additional masking operations that will render integer math slower? But we already need to do bitwise shift!
I don't know why. Thanks for anyone willing to answer.
(V8 developer here.) NaN boxing and pointer tagging are design choices with different tradeoffs, neither is strictly better than the other. V8's decision to use pointer tagging has been made long before I joined the project, so I can only speculate what the specific reason(s) might have been at the time.
Advantages of pointer tagging are:
significantly less memory consumption (certainly on 32-bit platforms; with "pointer compression" on 64-bit platforms too)
slightly more efficient (small) integer operations, because most CPUs' integer operations are faster than their double operations. This may not matter at all once an optimizing compiler enters the picture.
slightly more efficient pointer operations, because you can simply add an adjusted offset when accessing object fields (which has the same performance as not playing any pointer tricks at all), as opposed to having to mask off irrelevant parts of a NaN. This may not matter at all once an optimizing compiler enters the picture.
As you point out, the main benefit of NaN tagging is that it supports the full double range, which is very nice in some situations. You can build a well-performing engine based on either technique.

What's the difference between *uint and uintptr in Golang?

According to the Golang tour, we're provided with the following integer types:
int int8 int16 int32 int64
uint uint8 uint16 uint32 uint64 uintptr
In theory, that means we could also have pointers to all of these types as follows:
*int *int8 *int16 *int32 *int64
*uint *uint8 *uint16 *uint32 *uint64 *uintptr
If this is the case, then we already have a pointer to a uint in the form of *uint. That would make uintptr redundant. The official documentation doesn't shed much light on this:
uintptr is an integer type that is large enough to hold the bit pattern of any pointer.
As I understand it, that means that the bit width of a uint is determined at compile time based on the target architecture (typically either 32-bit or 64-bit). It seems logical that the pointer width should scale to the target architecture as well (IE: a 32-bit *uint points to a 32-bit uint). Is that the case in Golang?
Another thought was that maybe uintptr was added to make the syntax less confusing when doing multiple indirection (IE: foo *uinptr vs foo **uint)?
My last thought is that perhaps pointers and integers are incompatible data types in Golang. That would be pretty frustrating since the hardware itself doesn't make any distinction between them. For instance, a "branch to this address" instruction can use the same data from the same register that was just used in an "add this value" instruction.
What's the real point (pun intended) of uintptr?
The short answer is "never use uintptr". 😀
The long answer is that uintptr is there to bypass the type system and allow the Go implementors to write Go runtime libraries, including the garbage collection system, in Go, and to call C-callable code including system calls using C pointers that are not handled by Go at all.
If you're acting as an implementor—e.g., providing access to system calls on a new OS—you'll need uintptr. You will also need to know all the special magic required to use it, such as locking your goroutine to an OS-level thread if the OS is going to do stack-ish things to OS-level threads, for instance. (If you're using it with Go pointers, you may also need to tell the compiler not to move your goroutine stack, which is done with special compile-time directives.)
Edit: as kostix notes in a comment, the runtime system considers an unsafe.Pointer as a reference to an object, which keeps the object alive for GC. It does not consider a uintptr as such a reference. (That is, while unsafe.Pointer has a pointer type, uintptr has integer type.) See also the documentation for the unsafe package.
uintptr is simply an integer representation of a memory address, regardless of the actual type it points to. Sort of like void * in C, or just casting a pointer to an integer. It's purpose is to be used in unsafe black magic, and it is not used in everyday go code.
You are conflating uintptr and *uint. uintptr is used when you're dealing with pointers, it is a datatype that is large enough to hold a pointer. It is mainly used for unsafe memory access, look at the unsafe package. *uint is a pointer to an unsigned integer.

How to declare an unsigned long type in java?

I need my program to run with big, natural numbers and zero. The program itself is not important to this question, or at least I think it is not. I looked up which primitiv data type would suite my aim best and I found the unsigned long.
Accroding to the webisite, unsined longs are supported from java 8 and onwarts. However, it does not say how to declare a variable as an unsigned long.
By googling, I find pages complaining about the lack of unsigned data types compared to C++ (from where I now the principe of unsigned primitiv types).
So my question is, how to declare an unsigned long type in java?
The aim of the big number is to make the implementation slower. The reason therefore is to compare two methods, doing the same job. It is an university asignment, so I am not interested in how much sense this makes.
If unasigned types do not work in java or only very inconvienently, which primitiv data type allows the usage of the highest positiv and whole numbers? Is long or double suited better?
Don't think you can.
You could however try this

What's the difference between pointer and value in struct?

Given the following struct:
type Exp struct {
foo int,
bar *int
}
What is the difference in term of performance when using a pointer or a value in a struct. Is there any overhead or this just two schools of Go programming?
I would use pointers to implement a chained struct but is this the only case we have to use pointers in struct in order to gain performance?
PS: in the above struct we talk about a simple int but it could be any other type (even custom one)
Use the form which is most functionally useful for your program. Basically, this means if it's useful for the value to be nil, then use a pointer.
From a performance perspective, primitive numeric types are always more efficient to copy than to dereference a pointer. Even more complex data structures are still usually faster to copy if they are smaller than a cache line or two (under 128 bytes is a good rule of thumb for x86 CPUs).
When things get a little larger, you need to benchmark if performance concerns you. CPUs are very efficient at copying data, and there are so many variables involved which will determine the locality and cache friendliness of your data, it really depends on your program's behavior, and the hardware you're using.
This is an excellent series of articles if you want to better understand the how memory and software interact: "What every programmer should know about memory".
In short, I tell people to choose a pointer or not based on the logic of the program, and worry about performance later.
Use a pointer if you need to pass something to be modified.
Use a pointer if you need to determine if something was unset/nil.
Use a pointer if you are using a type that has methods with pointer receivers.
If the size of a pointer is less than the struct member, then using a pointer is more efficient since you don't need to copy the member but just its address. Also, if you want to be able to move or share some part of a structure, it is better to have a pointer so that you can, again, only share the address of the member. See also the golang faqs.

Program extraction using native integers/words (not bignums) from Isabelle theory

This question comes in a context where Isabelle is used with formal software development in mind more than with pure maths theorization in mind (and from a standalone developer's context).
Seems at best, SML programs generated from an Isabelle theory, use SML's IntInf.int, not the native integer type, which is Int.int; even if Code_Target_Int, Code_Binary_Nat or Code_Target_Nat is used. Investigation of these theories sources seems to confirm it's all it can do. Native platform integers may be required for multiple reasons, including efficiency and the case the SML imperative program is to be optionally translated into an imperative language subset (ex. C or Ada), which is relevant when the theory relies on the Imperative_HOL theory. The codegen.pdf document which comes with the Isabelle distribution, did not help with it, except in suggesting the first of the options below.
Options may be:
Not using Isabelle's int and nat and re‑create a new numeric type from scratch, then use the code_printing commands (with its type_constructor and constant) to give it the native platform representation and operations (implies inclusion of range limitations in some way in the theory) : must be tedious, although unlikely error‑prone I hope, due to the formal environment. Note this does seems feasible with Isabelle's own int and nat… it makes code generation fails, and nothing tells which constants are missing in the code_printing command.
If the SML program is to be compiled directly (ex. with MLTon), tweak the SML environment with a replacement IntInf structure : may be unsafe or not feasible, and still requires to embed the range limitations in the theory, so the previous options may finally be better than this one.
Touch the generated program to change IntInf into Int : easy, but it is safe? (at least, IntInf implements the same signature as Int do, so may be it's safe). As above, requires to specifies bounds in the theory in some way, it's OK with this.
Dive into Isabelle internals : surely unreasonable, even worse than the second option.
There exist a Word theory, but according to some readings, it's seems not suited for that purpose.
Are they other known options not listed here? Are they comments on the listed options?
If there is no ready‑to‑cook solutions (I feel there is no at the time), what hints or tracks would be best known? (ex. links to documents, mentions of concepts).
Update
Points #2 and #3 of the list, may be OK (if it really is) only if there is a single integer type. If the program use more than only one, it's not applicable.
Directly generating native words from Isabelle int would be unsound, because your formalisation would not take overflow into account where it exists in reality.
It looks like the AFP entry Native_Word does what you want, though:
http://afp.sourceforge.net/entries/Native_Word.shtml

Resources