x86 Instruction Encoding C++ Pointer as Address - pointers

I'm trying to encode x86 instructions to take an address specified by a C++ pointer. For example, I have a 32 bit number that I would like to move to the eax register. In assembly, this is easy.
mov eax, number
I would like to encode in binary the x86 instruction to do this using a C++ pointer to the number. The opcode for mov (memory to register 32 bit) is 0x8B. I am not sure what to use for the Mod-Reg-R/M byte and the displacement. Is there a way to encode the address from the pointer directly? Or do I have to do some displacement math? Also, I don't really understand how displacement works.
The instructions will be stored and called in dynamically allocated memory. This is for a dynamic recompiler for an emulator. I have tried all sorts of combinations with Mod-Reg-R/M and the pointer, with no luck. I also haven't been able to find anything online explaining how this works.

Related

int32_t for stack in 32 microcontroller

In bellow example why we should use int32_t instead of uint32_t ? (platform is ARM 32 bit microcontroller)
struct tcb{
int32_t *stackPt;
struct tcb *nextPt;
};
It's a part of RTOS tutorial. and tcb is for thread control block .
why we should use int32_t* for stack ?
There is no particular reason that you should use pointer to signed rather than unsigned.
You are probably never going to dereference this pointer directly to access the words on the stack. If you do want to access the data on the stack, some of it will be signed, some unsigned, and some neither (strings etc) so the pointer type will not help you with that.
When you want to pass around a pointer but never dereference it then one convention is to use a pointer to void, but that convention isn't so popular in embedded system code.
One reason to use a pointer to a 32-bit integer is to suggest that the pointer is at least word-aligned. If you intend on complying with the ARM EABI (which you should) then the stack should be doubleword (64-bit) aligned at the entry to every EABI compliant function. To hint that that is the case you might want to even use a (u)int64_t pointer. This might be misleading though because not everything on the stack is 64-bit or 32-bit aligned, just the whole frames.

What's the difference between *uint and uintptr in Golang?

According to the Golang tour, we're provided with the following integer types:
int int8 int16 int32 int64
uint uint8 uint16 uint32 uint64 uintptr
In theory, that means we could also have pointers to all of these types as follows:
*int *int8 *int16 *int32 *int64
*uint *uint8 *uint16 *uint32 *uint64 *uintptr
If this is the case, then we already have a pointer to a uint in the form of *uint. That would make uintptr redundant. The official documentation doesn't shed much light on this:
uintptr is an integer type that is large enough to hold the bit pattern of any pointer.
As I understand it, that means that the bit width of a uint is determined at compile time based on the target architecture (typically either 32-bit or 64-bit). It seems logical that the pointer width should scale to the target architecture as well (IE: a 32-bit *uint points to a 32-bit uint). Is that the case in Golang?
Another thought was that maybe uintptr was added to make the syntax less confusing when doing multiple indirection (IE: foo *uinptr vs foo **uint)?
My last thought is that perhaps pointers and integers are incompatible data types in Golang. That would be pretty frustrating since the hardware itself doesn't make any distinction between them. For instance, a "branch to this address" instruction can use the same data from the same register that was just used in an "add this value" instruction.
What's the real point (pun intended) of uintptr?
The short answer is "never use uintptr". 😀
The long answer is that uintptr is there to bypass the type system and allow the Go implementors to write Go runtime libraries, including the garbage collection system, in Go, and to call C-callable code including system calls using C pointers that are not handled by Go at all.
If you're acting as an implementor—e.g., providing access to system calls on a new OS—you'll need uintptr. You will also need to know all the special magic required to use it, such as locking your goroutine to an OS-level thread if the OS is going to do stack-ish things to OS-level threads, for instance. (If you're using it with Go pointers, you may also need to tell the compiler not to move your goroutine stack, which is done with special compile-time directives.)
Edit: as kostix notes in a comment, the runtime system considers an unsafe.Pointer as a reference to an object, which keeps the object alive for GC. It does not consider a uintptr as such a reference. (That is, while unsafe.Pointer has a pointer type, uintptr has integer type.) See also the documentation for the unsafe package.
uintptr is simply an integer representation of a memory address, regardless of the actual type it points to. Sort of like void * in C, or just casting a pointer to an integer. It's purpose is to be used in unsafe black magic, and it is not used in everyday go code.
You are conflating uintptr and *uint. uintptr is used when you're dealing with pointers, it is a datatype that is large enough to hold a pointer. It is mainly used for unsafe memory access, look at the unsafe package. *uint is a pointer to an unsigned integer.

Compiling an Assembly Program using avr

why do we need to initialize stack pointer in the begnning of the program of AVR assembly programming
Your assembly program is calling a subroutine. When you do that, the return address is stored on the stack using the stack pointer, so it's important to initialize it to point to an appropriate place in RAM. The ATmega328P datasheet says:
During interrupts and subroutine calls, the return address Program Counter (PC) is stored on the Stack. The
Stack is effectively allocated in the general data SRAM, and consequently the Stack size is only limited by the
total SRAM size and the usage of the SRAM. All user programs must initialize the SP in the Reset routine
(before subroutines or interrupts are executed). The Stack Pointer (SP) is read/write accessible in the I/O space.
The data SRAM can easily be accessed through the five different addressing modes supported in the AVR
architecture.
Very simple, the answer comes straight frm the datasheet - look for Stack Pointer. Stack pointer initial value is 0x0000, meaning it would point to register R0 (which adress is 0x0000) if not initialized. You would not want that, as you use R0 and other register to perform operations. That is why you want to set the stack to some other memory area, specifically to the Internal SRAM (a general purpose RAM area).
It depends on the microcontroller you are using. Older AVRs had the stack pointer initialized by hardware to 0x0000. You had to change that to something sensible (most often RAMEND) before using subroutines or interrupts. Newer AVRs have the stack pointer initialized by hardware to RAMEND, so you do not need software initialization.
You will have to check the datasheet to see whether your particular MCU needs that software initialization or not. Where in doubt, do it anyway: it doesn't hurt (it takes only 4 CPU cycles) and it can make your code more portable. Also, a bootloader may have altered the stack pointer.

What's the point of __ptr32 and __ptr64?

As described in this MSDN article, Microsoft has these two type annotations to declare native pointers on different architectures. However, on the second line:
On a 32-bit system, a pointer declared with __ptr64 is truncated to a 32-bit pointer. On a 64-bit system, a pointer declared with __ptr32 is coerced to a 64-bit pointer.
This sounds to me like the declaration doesn't matter; if the architecture overrides the declaration of __ptrXX to be the default anyways, what's the point of marking __ptrXX in the first place?
I see that this answer says that it's for interop, but if the declarations are essentially overridden as above, how does that help with interop?
There's a big difference between declaring and assigning a 32-bit pointer and actually using it. In other words, dereferencing the pointer. If you do that in a 64-bit process then there is no other option but to sign-extend it to a 64-bit pointer. Which is what "coerced" means. That may work by accident, but you'd have to be pretty lucky. It just doesn't make sense to try.
The point of declaring a __ptr32 is as described in that linked answer, it only makes sense when you interop with a 32-bit process. Which uses 32-bit pointers. It is not common.

BigInt for Standard ML/NJ

Is there a Java BigInt equivalent for Standard ML? The normal int type throws an exception when it overflows.
Yes, see the IntInf structure.
The official SML'97 standard basis library introduces a zoo of structures like Int, IntInf, Int32, Int64, LargeInt etc.
To actually use them in practice to make things work as expected, and make them work efficiently, you need to look closely at the SML implementation at hand.
One family of implementations imitates the memory layout of C and Java, so Int32 will be really a 32bit machine word (but with overflow checking), and Int64 a 64bit machine word. SML/NJ is a notable example for that, and its small int arithmentic is fast, but its big int arithmentic slow.
Another family of implementations come from the background of symbolic computation (LISP or Computer Algebra), where Poly/ML is a notable example. Here you have Int = IntInf = LargeInt by default, and the implementation first uses (part of) the native machine word as approximation, until it overflows and then switches to really big integers that are allocated on the heap (as boxed values). Poly/ML uses the GNU MP library for that big part.
Thus Int/IntInf is very efficient as long as your application is about integers, not machine words of a specific size: Int32 in the symbolic model won't fit into a single word on 32bit hardware due to the extra tag bits that are required. So some algorithms that are actually about word arithmetic will degrade, for example SHA1 on 32bit hardware.
On the other hand, the implicit upgrade of shorter-than-wordsize int to heap-allocated big int gives you something better than BigInt in Java, because you won't need the full object overhead for small values: 42 will be just some bit pattern in a register (with additional tag bit), but not a heavy box on the heap.
The BigInt-equivalent is called LargeInt. See these lecture notes to see some functions on how to convert between int (aka Int) and LargeInt.
While this isn't exactly what you were asking, you don't actually want an equivalent to the Java BigInt class. Java's BigInt class implements O(n^2) time for multiplication (essentially multiplying the way it's taught in elementary school), instead of O(n log n), which is possible. This is really important, as a lot of trivial BigInt programming simply doesn't work with the n^2 version.
Well, int puts a nasty limit on stuff like calculating permutations. SML needs a large numeric datatype thats more natural to use.

Resources