Instruction set - Decode opcode - opcode

I'm trying to understand how the /d affects the opcode.
Example:
FF /6 PUSH r/m16 M Valid Valid Push r/m16.
How meaning is expressed?
Can anyone give me an example of the difference?
Thanks!

There are actually many instructions using FF as opcode:
INC rm16 FF /0
INC rm32 FF /0
INC rm64 FF /0
DEC rm16 FF /1
DEC rm32 FF /1
DEC rm64 FF /1
CALL rm16 FF /2
CALL rm32 FF /2
CALL rm64 FF /2
CALL FAR mem16:16 FF /3
CALL FAR mem16:32 FF /3
JMP rm16 FF /4
JMP rm32 FF /4
JMP rm64 FF /4
JMP FAR mem16:16 FF /5
JMP FAR mem16:32 FF /5
PUSH rm16 FF /6
PUSH rm32 FF /6
PUSH rm64 FF /6
As you may see, the /d part is a 3 bit sequence held in the byte following the opcode (the so called ModR/M byte), which help discriminate the correct instruction.
From the Intel reference documentation:
Many instructions that refer to an operand in memory have an addressing-form spec-
ifier byte (called the ModR/M byte) following the primary opcode. The ModR/M byte
contains three fields of information:
• The mod field combines with the r/m field to form 32 possible values: eight
registers and 24 addressing modes.
• The reg/opcode field specifies either a register number or three more bits of
opcode information. The purpose of the reg/opcode field is specified in the
primary opcode.
• The r/m field can specify a register as an operand or it can be combined with the
mod field to encode an addressing mode. Sometimes, certain combinations of
the mod field and the r/m field is used to express opcode information for some
instructions.
So that /d value is actually extracted from the reg/opcode field. When the CPU loads up the first opcode, it knows that it should read an additional byte following it, and read that field in order to complete the instruction.

Related

Data always being stored in the same address in elf64 NASM?

I wrote a simple Hello world program in NASM, to then look at using objdump -d out of curiosity. The program is as follows:
BITS 64
SECTION .text
GLOBAL _start
_start:
mov rax, 0x01
mov rdi, 0x00
mov rsi, hello_world
mov rdx, hello_world_len
syscall
mov rax, 0x3C
syscall
SECTION .data
hello_world: db "Hello, world!", 0x0A
hello_world_len: equ $-hello_world
When I inspected this program, I found that the actual implementation of this uses movabs with the hex value 0x402000 in place of a name, which makes sense, except for the fact that surely this would mean that it knows 'Hello, world!' is going to be stored at 0x402000 everytime the program is run, and there is no reference to 'Hello, world!' anywhere in the output of objdump -d hello_world (the output of which I provided below).
I tried rewriting the program; This time I replaced hello_world on line 8 with mov rsi, 0x402000 and the program still compiled and worked perfectly.
I thought maybe it was some encoding of the name, however changing the text 'hello_world' in SECTION .data did not change the outcome either.
I'm more confused than anything - How does it know the address at compile time, and how come it never changes, even on recompilation?
(OUTPUT OF objdump -d hello_world)
./hello_world: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
401000: b8 01 00 00 00 mov $0x1,%eax
401005: bf 00 00 00 00 mov $0x0,%edi
40100a: 48 be 00 20 40 00 00 movabs $0x402000,%rsi
401011: 00 00 00
401014: ba 0e 00 00 00 mov $0xe,%edx
401019: 0f 05 syscall
40101b: b8 3c 00 00 00 mov $0x3c,%eax
401020: bf 00 00 00 00 syscall
(as you can see, no 'Disassembly of section .data', which further confuses me)
The string is known at compile time too. It statically exists in your executable. The compiler put it at the address in the first place, so of course it knows the address!
(And in an ASLR or dylib environment this would still apply, because all addresses relative to the module would get shifted as needed and the compiler would put a relocation entry so the loader knows there is an address reference there to fix up, but they would still stay the same relative to each other.)
And this doesn't mean that every program ever existing will have unique memory locations, nor does it mean that all contents of a program have to idly sit around and use up all of your memory even if they are rarely needed, because this is virtual memory.
The address is only meaningful within your own process, and the memory page in question doesn't have to exist in memory physically, it can be paged in and out as needed, and it's the OS' memory manager's job to decide what to keep in physical memory at what times. Attempting to access an address belonging to a page that's not physically in memory will make it transparently get paged in by the kernel at that point in time. But with such a small program, most likely the whole program will be in memory from the start.
In user-mode code, you will generally never see physical memory addresses. This is entirely abstracted away by the kernel.

Intel XED: different function call addresses when decoding the same instruction [duplicate]

I need a helping hand in order to understand the following assembly instruction. It seems to me that I am calling a address at someUnknownValue += 20994A?
E8 32F6FFFF - call std::_Init_locks::operator=+20994A
Whatever you're using to obtain the disassembly is trying to be helpful, by giving the target of the call as an offset from some symbol that it knows about -- but given that the offset is so large, it's probably confused.
The actual target of the call can be calculated as follows:
E8 is a call with a relative offset.
In a 32-bit code segment, the offset is specified as a signed 32-bit value.
This value is in little-endian byte order.
The offset is measured from the address of the following instruction.
e.g.
<some address> E8 32 F6 FF FF call <somewhere>
<some address>+5 (next instruction)
The offset is 0xFFFFF632.
Interpreted as a signed 32-bit value, this is -0x9CE.
The call instruction is at <some address> and is 5 bytes long; the next instruction is at <some address> + 5.
So the target address of the call is <some address> + 5 - 0x9CE.
If you are analyzing the PE file with a disassembler, the disassembler might had given you the wrong code. Most malware writer uses insertion of E8 as anti-disassembly technique. You can verify if the codes above E8 are jump instructions where the jump location is after E8.

Different result when decrypting file in Go and OpenSSL

I have written the following code to decrypt a file:
data, err := ioutil.ReadFile("file.encrypted")
if err != nil {
log.Fatal(err)
}
block, err := aes.NewCipher(key)
if err != nil {
log.Fatal(err)
}
mode := cipher.NewCBCDecrypter(block, iv)
mode.CryptBlocks(data, data)
err = ioutil.WriteFile("file.decrypted", data, 0644)
if err != nil {
log.Fatal(err)
}
I have also decrypted the file using OpenSSL:
openssl aes-128-cbc -d -in file.encrypted -out file.decrypted -iv $IV -K $KEY
Output file from Go program is 8 bytes larger than output file from from OpenSSL.
Tail of hexdump from file generated by OpenSSL:
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
ff ff ff ff ff ff ff ff |........|
Tail of hexdump from file generated by Go program:
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
ff ff ff ff ff ff ff ff 08 08 08 08 08 08 08 08 |................|
Why is 08 08 08 08 08 08 08 08 appended to file output from Go program?
EDIT:
As BJ Black explains, the reason for extra bytes in output from my Go program is PKCS padding.
The file is encrypted with AES in CBC mode and therefore the plain text input shall be a multiple of block size, padding is added to fulfill this requirement. AES has a block size of 16 bytes so the total number of padding bytes will always be between 1 and 16 bytes. Each padding byte has a value equal to the total number of padding bytes which in my case is 0x08.
So, to find out the amount of padding added to the file, one just have to read the last byte of decrypted file and convert that number to int:
paddingBytes := int(data[len(data)-1])
The WriteFile function can then be modified like this:
err = ioutil.WriteFile("file.decrypted", data[:len(data)-paddingBytes], 0644)
Now output from my Go program is identical to the output from OpenSSL.
What you're seeing is PKCS padding, which OSSL is removing for you and Go isn't by default. See the relevant Reddit post here.
Basically, follow the example and you're good to go.

Data misaligned when writing to SDHC card over SPI

I'm using a microcontroller (PIC18F26J50) to interface with a G.Skill 4GB microSD card.
SD Card initialization is successful and I go from receiving 0x01 (Idle) R1 tokens to 0x00 (Ready) R1 tokens.
Reading a data block works, I am able to read the location of partition 1 and read the first sector of that partition
However when attempting to write a block, I never see a response token. Upon dumping the raw blocks on the card. I see that the data did indeed write but it is not aligned properly...the best way to explain is with an actual picture
This should be filled with 0x01, 0x02, 0x03, and so on up to 0xFF before repeating
This card works absolutely fine in windows. And I'm able to read and write data to it properly.
Investigating, I find that the response I get is 0XCA, if you right-shift that you get 0xE5, a proper response token. The data itself is misaligned one to the left. Additionally, it appears that the two dummy bytes and the token were also written. Correcting for the shift you get:
FF FF FE 00 01 02 03 04 05 06 07 08 09 0A 0B 0C
So I removed the code to write the 2 dummy bytes and the 0xFE token, and holy s*#$ the card starts writing data IMMEDIATELY after the command, which I believe violates spec! Can anyone confirm if this is intended behavior for SDHC cards? Or is this card just running a really s*#$ty SD controller? (The latter I suspect because I have a 16GB card which is working fine)
I just had the exact same problem, which turned out not to be a problem with the SD card, but with my SPI interfacing.
The particular chip I was using (Freescale KL03) will retain the current received byte in the data buffer until you read it, even after you have started sending the next one. I was out of sync, so that each time I was writing an SPI byte, waiting for transmission and then reading from the buffer, I was actually getting the previous result and not the current one. As a consequence there was a single byte lag in every SPI transaction.
Thus, my scope revealed that I was exchanging the following with the card:
MOSI: 58 00 00 00 01 00 00 00 00 7E nn nn nn ...
MISO: FF FF FF FF FF FF FF 00 FF FF FF FF FF ...
which resulted in the misalignment you encountered. It should have been like this:
MOSI: 58 00 00 00 01 00 00 00 7E nn nn nn nn ...
MISO: FF FF FF FF FF FF FF 00 FF FF FF FF FF ...
In summary, ensure that you are sending the 7E immediately after the 00 response to your write command.

Asterisk Pointer in Assembly (I32 / x86) [duplicate]

This question already has an answer here:
How does the jmp instruction work in att assembly in this instance
(1 answer)
Closed 3 years ago.
The offending line:
8048f70: ff 24 85 00 a4 04 08 jmp *0x804a400(,%eax,4)
There is no instruction in the disassembled code at location 804a400 (my list ends at 804a247)
When I check to see what's at that memory location I get:
(gdb) x/c 0x804a40c
0x804a40c: -103 '\231'
(gdb) x/t 0x804a40c
0x804a40c: 10011001
(gdb) x/s 0x804a40c
0x804a40c: "\231\217\004\b\222\217\004\b\211\217\004\b\202\217\004\bw\217\004\b\002"
(gdb) x/3x 0x804a40c
0x804a40c: 0x99 0x8f 0x04
What exactly is this jmp statement trying to do?
That instruction is an indirect jump. This means that the memory address specified is not the jump target, but a pointer to the jump target.
First, the instruction loads the value at the memory address:
*0x804a400(,%eax,4)
which is more sensibly written as:
0x804a400 + %eax * 4 // %eax can be negative
And then set the %eip to that value.
The best way to decipher these is to use the Intel Programmer's Reference manual. Table 2-2 in Volume 2A provides a break down the ModR/M byte and in this case the SIB byte also.

Resources