I'm reading Low Level Programming by Igor Zhirkov, currently in topic 2.5 about addressing.
He shows a way to directly address memory, the example is as follows:
buffer: dq 8841, 99, 00
...
mov rax, [buffer + 8]
I know the dq creates a qword, but these values don't tell me anything about its purpose. The author says "the address in this instruction was preprocessed, as the base and the offset are constants controlled by the compiler".
Can anyone help me understand this statement?
Related
Suppose we have:
MOV #NUM, R0
I understand that the hashtag represents an immediate addressing mode. However, what I don't understand what exactly gets stored in R0 in this case. Is it the actual address of NUM?
I am trying to boot my small ARMv7 kernel (which runs just fine using qemu vexpress model) in ARMv8 Foundation Model v2.1. The model boots at level EL3 / 64 bits, and I managed to go down to level EL1 / 32 bits, but I encounter some issues (in a few words, the timer doesn't tick and some kprintf are missing, but that's not the issue here).
To debug my UART issue, I wanted to use the led / switches provides by the model. I can read their value from software quite easily, but I can't write a new value to either of them. The kernel seems to hang. Here is a minimal asm code that writes to the switches register:
.global Start
Start:
# we are in EL3 / 64 bits mode
# create the 0x1C010000 + 0x4 address of switches
mov x0, #4
movk x0, #0x1c01, lsl #16
# value to write
mov w1, #0xaa
# actual writing
strb w1, [x0]
It seems I am stuck at the strb instruction. For the record, if I replace strb with ldrb, I can correctly read and display the value of this register (I played with the --switches flag to be sure it worked).
Any one knows what I am doing wrong here ?
EDIT: thanks to unixsmurf suggestions, I know now that I got an synchronous Data Abort Exception with no level change, and that the reason is "Synchronous External Abort". I don't know how to inspect further, I guess I'll try ARM's forum.
Best,
V.
The ARM community finally solved the problem. The complete discussion can be found here.
If I have some pointer or pointer-like values packed into an SSE or AVX register, is there any particularly efficient way to dereference them, into another such register? ("Particularly efficient" meaning "more efficient than just using memory for the values".) Is there any way to dereference them all without writing an intermediate copy of the register out to memory?
Edit for clarification: that means, assuming 32-bit pointers and SSE, to index into four arbitrary memory areas at once with the four sections of an XMM register and return four results at once to another register. Or as close to "at once" as possible. (/edit)
Edit2: thanks to PaulR's answer I guess the terminology I'm looking for is "gather", and the question therefore is "what's the best way to implement gather for systems pre-AVX2?".
I assume there isn't an instruction for this since ...well, one doesn't appear to exist as far as I can tell and anyway it doesn't seem to be what SSE is designed for at all.
("Pointer-like value" meaning something like an integer index into an array pretending to be the heap; mechanically very different but conceptually the same thing. If, say, one wanted to use 32-bit or even 16-bit values regardless of the native pointer size, to fit more values in a register.)
Two possible reason I can think of why one might want to do this:
thought it might be interesting to explore using the SSE registers for general-purpose... stuff, perhaps to have four identical 'threads' processing potentially completely unrelated/non-contiguous data, slicing through the registers "vertically" rather than "horizontally" (i.e. instead of the way they were designed to be used).
to build something like romcc if for some reason (probably not a good one), one didn't want to write anything to memory, and therefore would need more register storage.
This might sound like an XY problem, but it isn't, it's just curiosity/stupidity. I'll go looking for nails once I have my hammer.
The question is not entirely clear, but if you want to dereference vector register elements then the only instructions which might help you here are AVX2's gathered loads, e.g. _mm256_i32gather_epi32 et al. See the AVX2 section of the Intel Intrinsics Guide.
SYNOPSIS
__m256i _mm256_i32gather_epi32 (int const* base_addr, __m256i vindex, const int scale)
#include "immintrin.h"
Instruction: vpgatherdd ymm, vm32x, ymm
CPUID Flag : AVX2
DESCRIPTION
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
OPERATION
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MEM[base_addr + SignExtend(vindex[i+31:i])*scale]
ENDFOR
dst[MAX:256] := 0
So if I understood this correctly, your title is misleading and you really want to:
index into the concatenation of all XMM registers
with an index held in a part of an XMM register
Right?
That's hard. And a little weird, but I'm OK with that.
Assuming crazy tricks are allowed, I propose self-modifying code: (not tested)
pextrb eax, xmm?, ? // question marks are the position of the pointer
mov edx, eax
shr eax, 1
and eax, 0x38
add eax, 0xC0 // C0 makes "hack" put its result in eax
mov [hack+4], al // xmm{al}
and edx, 15
mov [hack+5], dl // byte [dl] of xmm reg
call hack
pinsrb xmm?, eax, ? // put value back somewhere
...
hack:
db 66 0F 3A 14 00 00 // pextrb ?, ? ,?
ret
As far as I know, you can't do that with full ymm registers (yet?). With some more effort, you could extend it to xmm8-xmm15. It's easily adjustable to other "pointer" sizes and other element sizes.
I'm processing 6-byte messages from a piece of serial hardware.
In their manual, the manufacturer has laid out that the checksum of each message (its 6th byte) is composed of 'the low byte of the summation of the rest of the message.'
Here is one of their examples, dissected
Here are some others
I haven't tried all those examples yet, let me show my work on the first 'dissected' example:
This is the formula as provided:
Low byte of 0xB2 + 0x00 + 0x69 + 0x1A + 0x83 = 0x68
So, the summation is 0x1B8, if I take the first 8-bits, I get 0xB8
Hmmm... am I doing that wrong?
I thought for a bit and guessed, oh, maybe they just do a bitwise-operation instead, that's pretty common on older hardware right?
So I wrote out the bits of each part and XORed the series together...
0xB2 ^ 0x00 = 0xB2 (duh)
0xB2 ^ 0x69 = 0xDB
0xDB ^ 0x1A = 0xC1
0xC1 ^ 0x83 = 0x42
I did this by hand, and by calculator. Same result.
I was able to reproduce my computations in my program, my checksums are pretty different than what the hardware is outputting. The manual model number matches the hardware I have...
Looking at the binary of each part of the summation, I'm not sure I can see a clear pattern to each their documented output. In some checksums, like the IPv4 header, the carry is shifted or added back into the checksum, could that be the case here?
My question is:
Am I making a math error in how this checksum is being calculated?
Any help would be greatly appreciated! Thank you.
I just attacked all the samples with the Windows RT calculator, and all of the others (here) are fine - it's just the first example (which you dissected) that is erroneous. This looks like a simple documentation typo.
I have a pointer to an array, DI.
Is it possible to go to the value pointed to by both DI and another pointer?
e.g:
mov bl,1
mov bh,10
inc [di+bl]
inc [di+bh]
And, on a related note, is there a single line opcode to swap the values of two registers? (In my case, BX and BP?)
For 16-bit programs, the only supported addressing forms are:
[BX+SI]
[BX+DI]
[BP+SI]
[BP+DI]
[SI]
[DI]
[BP]
[BX]
Each of these may include either an 8- or 16-bit constant displacement.
(Source: Intel Developer's Manual volume 2A, page 38)
The problem with the example provided is that bl and bh are eight-bit registers and cannot be used as the base pointer. However, if you set bx to the desired value then inc [di+bx] (with a suitable size specifier for the pointer) is valid.
As for swapping "the high and low bits of a register," J-16 SDiZ's suggestion of ror bx, 8 is fine for exchanging bl and bh (and IIRC, it is the optimal way to do so). However, if you want to exchange bit 0 of (say) bl with bit 7 of bl, you'll need more logic than that.
DI is not a pointer, it is an index.
You can you ROR BX, 8 to rotate a lower/higher byte of a register.