GDB Register Display Issue - math

I have recently started delving deeper into Assembly, and I could not determine what is going on when I analyze the following code segment. Essentially, 0xFFFFFFFF is moved into EAX and then 0x10 is added to it. When viewing EAX within GDB, the value after execution is 0xF rather than 0x9. When I add 0x11, rather than 0x10, the proper result (0x10) is displayed. Any help would be much appreciated.
I have attached debug output below.
The first value after command execution is EAX, which is displayed using print/x $eax.
(gdb) ni
$11 = 0xffffffff
Dump of assembler code from 0x8048096 to 0x80480a0:
=> 0x08048096 <_start+22>: add eax,0x10
0x08048099 <_start+25>: mov eax,0x0
0x0804809e <_start+30>: add BYTE PTR ds:0x804910c,0x22
End of assembler dump.
0x08048096 in _start ()
(gdb) ni
$13 = 0xf
Dump of assembler code from 0x8048099 to 0x80480a3:
=> 0x08048099 <_start+25>: mov eax,0x0
0x0804809e <_start+30>: add BYTE PTR ds:0x804910c,0x22
End of assembler dump.
0x08048099 in _start ()

You appear to expect that adding 0x10 to -1 (0xFFFFFFFF) will produce 0x9.
But 0x10 is 16 (decimal), and adding -1 to it produces 15 (decimal), which is 0xF.
So everything is working here just as it should.

Related

Bootloader Jump Function. How to Jump to the right Address?

I am trying to create a bootloader that jumps to my application code on a MKE02Z32VFM4 (KEO2 Series from Freescale). I am working with the Keil IDE 5 and the Armv6 Compiler v6.16.
After Issuing the Jump Instruction to the application start address, the code Jumps to "a" reset handler. And when the instruction to jump to __main is reached, it jumps to the main of the bootloader. The Flash Memory is defined by the linker file as followed:
#define m_interrupts_start 0x00000000
#define m_interrupts_size 0x00000200
#define m_flash_config_start 0x00000400
#define m_flash_config_size 0x00000010
#define bootloader_start 0x00000410
#define bootloader_size 0x00000800 //2kb size 0x410+0x800=0xC10 ==> 256 byte aligned => 0xE00
#define ota_part_0_start 0x00000E00 //Vector Table interrupt must be 256 byte aligned
#define ota_part_0_size 0x00003800 //14KB (14336 Byte) 0xE00+0x3800 => 0x4600
#define ota_part_1_start 0x00004600
#define ota_part_1_size 0x00003800 //14KB (14336 Byte) 0x4600+0x3800 = 0x7E00 || flash_end == 0x0000 7FFF => 0x100(256) byte frei
#define m_data_start 0x1FFFFC00 //ram start
#define m_data_size 0x00001000 //4kb
The application linker file (scatter file) is working with these defines:
#define m_interrupts_start 0x00000E00 //Address of the application reset handler
#define m_interrupts_size 0x00000200
#define m_flash_config_start 0x00001000 //some config bytes, defined by manufacturer
#define m_flash_config_size 0x00000010
#define m_text_start 0x00001010 // start address of application code
#define m_text_size 0x000035F0
#define m_data_start 0x1FFFFC00 //ram start
#define m_data_size 0x00001000 //4kb
The reset handler is written in assembler, i tried to comment the instructions:
Reset_Handler:
cpsid i /* Mask interrupts */
.equ VTOR, 0xE000ED08 //.equ is like #define in C. VTOR = predefined ARMv6 label. 0xE000ED08 VectorTableOffsetRegister.
ldr r0, =VTOR // load word from memory. load value from word at VTOR address to r0. R0 now contains the offset for the vector table.
ldr r1, =__Vectors // load word from memory. load value of word at __Vectors address to r1. --> the first word at __Vectors is the initial stack pointer
str r1, [r0] //store Register to memory. content of r1 is stored to memory adress in r0(==VTOR) --> initial stack pointer is stored to the first word of the Vector table
ldr r2, [r1] //load word from memory. r2 is set to the value of the word in memory at address in r1. --> r2 is set to the address of the initial stack pointer
msr msp, r2 //move to special register. move value of r2 to special register msp (main stack pointer) --> main stack pointer is set to the valjue of the initial stack pointer
ldr r0,=SystemInit //set register 0 to address of SystemInit function. (
blx r0 // branch with link ( to address of r0)
cpsie i /* Unmask interrupts */
ldr r0,=__main
bx r0
.pool
.size Reset_Handler, . - Reset_Handler
The bootloader code is as followed:
Address in this first test is the value 0x00000E00 (start of user app)
__attribute__( ( naked, noreturn ) ) void BootJumpASM( uint32_t SP, uint32_t RH )
{
__asm("MSR MSP,r0");
__asm("BX r1");
}
static void BootJump( uint32_t *Address )
{
if( CONTROL_nPRIV_Msk & __get_CONTROL( ) ) //THIS is from the arm doku, but it is always false in our implementation and skipped.
{ /* not in privileged mode */
EnablePrivilegedMode( ) ;
}
NVIC->ICER[0] = 0xFFFFFFFF ;
NVIC->ICPR[0] = 0xFFFFFFFF ;
SysTick->CTRL = 0 ;
SCB->ICSR |= SCB_ICSR_PENDSTCLR_Msk ;
if( CONTROL_SPSEL_Msk & __get_CONTROL( ) ) //THIS is from the arm doku, but it is always false in our implementation and skipped. (only 1 stack pointer used)
{ /* MSP is not active */
__set_MSP( __get_PSP( ) ) ;
__set_CONTROL( __get_CONTROL( ) & ~CONTROL_SPSEL_Msk ) ;
}
SCB->VTOR = ( uint32_t )Address ; //Setting the Vector Table Offset Register to the start of the user app.
BootJumpASM( Address[ 0 ], Address[ 1 ] ) ; //This function is taken from the Arm Documentation
}
After
SCB->VTOR = (uint32_t)Address; // Set VTOR to 0xE00
The VTOR register IS updated to 0xE00. However after executing the function:
__attribute__( ( naked, noreturn ) ) void BootJumpASM( uint32_t SP, uint32_t RH )
{
__asm("MSR MSP,r0");
__asm("BX r1"); //<-- This is the Point where VTOR changes it value to 0x00 again
}
VTOR is 0x00 again and im in the resethandler. This resethandler connects to the bootloader main. So i assume im in the reset handler at 0x00 and not the one at 0xE00. I checked the flash memory and am positive that a Vector Table is located at 0x000 AND 0xE00. I am positive that the firmware of the application is also at the right place in the flash.
I am assuming that I either:
Defined the Memory space wrong.
The BootJumpASM function jumps to a illegal location and the MCU restarts over at 0x00 with a reset VTOR Register.
I am not sure, why the BootJumpASM function uses r0 and r1 and what it does with the arguments of the function. I am simply new at assembler and all the specific compiler attributes. The function like described above is directly copied from:
https://developer.arm.com/documentation/ka002218/latest
And while i do not understand how the compiler manages to put the Function arguments to register r0 and r1 I am sure that the mistake is at my side and not in the official arm docs.
Can someone explain to me, why after the second instruction of the "BootJumpASM" function "VTOR" is reset to 0x00?
and why the resethandler ,the debugger is in right after, connects to the bootloader main and not the application main. And how do i manage to jump to the right location in memory.
Thanks for your time. I hope this explanation is not too confusing.
The problem was not the jump instruction, but the Debugger of the Keil IDE. I set up the debug environment according to arm and Keil documentation but after the jump out of the code environment of the bootloader into the application memory area, the Debugger triggered a reset. (Bootloader is a seperate Keil project.)
Starting the debugger within the application project, no such reset is triggered after the jump instruction and following the dissasembly view the bootloader executes as expected and the jump instruction works.
Thanks to all for taking time to try and find the error with me.

Return a pointer at a specific position - Assembly

I am a beginner in Assembly and i have a simple question.
This is my code :
BITS 64 ; 64−bit mode
global strchr ; Export 'strchr'
SECTION .text ; Code section
strchr:
mov rcx, -1
.loop:
inc rcx
cmp byte [rdi+rcx], 0
je exit_null
cmp byte [rdi+rcx], sil
jne .loop
mov rax, [rdi+rcx]
ret
exit_null:
mov rax, 0
ret
This compile but doesn't work. I want to reproduce the function strchr as you can see. When I test my function with a printf it crashed ( the problem isn't the test ).
I know I can INC rdi directly to move into the rdi argument and return it at the position I want.
But I just want to know if there is a way to return rdi at the position rcx to fix my code and probably improve it.
Your function strchr seems to expect two parameters:
pointer to a string in RDI, and
pointer to a character in RSI.
Register rcx is used as index inside the string? In this case you should use al instead of cl. Be aware that you don't limit the search size. When the character refered by RSI is not found in the string, it will probably trigger an exception. Perhaps you should test al loaded from [rdi+rcx] and quit further searching when al=0.
If you want it to return pointer to the first occurence of character
inside the string, just
replace mov rax,[rdi+rcx] with lea rax,[rdi+rcx].
Your code (from edit Version 2) does the following:
char* strchr ( char *p, char x ) {
int i = -1;
do {
if ( p[i] == '\0' ) return null;
i++;
} while ( p[i] != x );
return * (long long*) &(p[i]);
}
As #vitsoft says, your intention is to return a pointer, but in the first return (in assembly) is returning a single quad word loaded from the address of the found character, 8 characters instead of an address.
It is unusual to increment in the middle of the loop.  It is also odd to start the index at -1.  On the first iteration, the loop continue condition looks at p[-1], which is not a good idea, since that's not part of the string you're being asked to search.  If that byte happens to be the nul character, it'll stop the search right there.
If you waited to increment until both tests are performed, then you would not be referencing p[-1], and you could also start the index at 0, which would be more usual.
You might consider capturing the character into a register instead of using a complex addressing mode three times.
Further, you could advance the pointer in rdi and forgo the index variable altogether.
Here's that in C:
char* strchr ( char *p, char x ) {
for(;;) {
char c = *p;
if ( c == '\0' )
break;
if ( c == x )
return p;
p++;
}
return null;
}
Thanks to your help, I finally did it !
Thanks to the answer of Erik, i fixed a stupid mistake. I was comparing str[-1] to NULL so it was making an error.
And with the answer of vitsoft i switched mov to lea and it worked !
There is my code :
strchr:
mov rcx, -1
.loop:
inc rcx
cmp byte [rdi+rcx], 0
je exit_null
cmp byte [rdi+rcx], sil
jne .loop
lea rax, [rdi+rcx]
ret
exit_null:
mov rax, 0
ret
The only bug remaining in the current version is loading 8 bytes of char data as the return value instead of just doing pointer math, using mov instead of lea. (After various edits removed and added different bugs, as reflected in different answers talking about different code).
But this is over-complicated as well as inefficient (two loads, and indexed addressing modes, and of course extra instructions to set up RCX).
Just increment the pointer since that's what you want to return anyway.
If you're going to loop 1 byte at a time instead of using SSE2 to check 16 bytes at once, strchr can be as simple as:
;; BITS 64 is useless unless you're writing a kernel with a mix of 32 and 64-bit code
;; otherwise it only lets you shoot yourself in the foot by putting 64-bit machine code in a 32-bit object file by accident.
global mystrchr
mystrchr:
.loop: ; do {
movzx ecx, byte [rdi] ; c = *p;
cmp cl, sil ; if (c == needle) return p;
je .found
inc rdi ; p++
test cl, cl
jnz .loop ; }while(c != 0)
;; fell out of the loop on hitting the 0 terminator without finding a match
xor edi, edi ; p = NULL
; optionally an extra ret here, or just fall through
.found:
mov rax, rdi ; return p
ret
I checked for a match before end-of-string so I'd still have the un-incremented pointer, and not have to decrement it in the "found" return path. If I started the loop with inc, I could use an [rdi - 1] addressing mode, still avoiding a separate counter. That's why I switched up the order of which branch was at the bottom of the loop vs. your code in the question.
Since we want to compare the character twice, against SIL and against zero, I loaded it into a register. This might not run any faster on modern x86-64 which can run 2 loads per clock as well as 2 branches (as long as at most one of them is taken).
Some Intel CPUs can micro-fuse and macro-fuse cmp reg,mem / jcc into a single load+compare-and-branch uop for the front-end, at least when the memory addressing mode is simple, not indexed. But not cmp [mem], imm/jcc, so we're not costing any extra uops for the front-end on Intel CPUs by separately loading into a register. (With movzx to avoid a false dependency from writing a partial register like mov cl, [rdi])
Note that if your caller is also written in assembly, it's easy to return multiple values, e.g. a status and a pointer (in the not-found case, perhaps to the terminating 0 would be useful). Many C standard library string functions are badly designed, notably strcpy, to not help the caller avoid redoing length-finding work.
Especially on modern CPUs with SIMD, explicit lengths are quite useful to have: a real-world strchr implementation would check alignment, or check that the given pointer isn't within 16 bytes of the end of a page. But memchr doesn't have to, if the size is >= 16: it could just do a movdqu load and pcmpeqb.
See Is it safe to read past the end of a buffer within the same page on x86 and x64? for details and a link to glibc strlen's hand-written asm. Also Find the first instance of a character using simd for real-world implementations like glibc's using pcmpeqb / pmovmskb. (And maybe pminub for the 0-terminator check to unroll over multiple vectors.)
SSE2 can go about 16x faster than the code in this answer for non-tiny strings. For very large strings, you might hit a memory bottleneck and "only" be about 8x faster.

TASM struct initilization and pointer math issues

I am attempting to write a simple DOS test program in assembly using TASM v4.1 that walks through a structure that contains four strings of equal length, but I've hit two issues.
ideal
model small
stack 1024
struc Strings_s
s1 db 32 dup (?)
s2 db 32 dup (?)
s3 db 32 dup (?)
s4 db 32 dup (?)
ends Strings_s
codeseg
start:
mov ax, #data
mov ds, ax ; Set %DS to point to the data segment
mov cx, 4 ; Load loop count
mov si, offset mystrings.s1 ; Load seg offset of first string
start_1:
push si
call putstr ; Print asciiz string
pop si
;add si, offset (Strings_s ptr ds:0).s1 ; ***BROKEN***
add si, offset (Strings_s ptr ds:0).s2 ; FIXED
loop start_1 ; Loop
fin:
mov ax, 4C00h ; [DOS] terminate program
int 21h ; ...
putstr_0:
mov bx, 07h
mov ah, 0Eh ; [BIOS] Display character
int 10h ; ...
putstr:
lodsb ; Get next char from %SI
test al, al ; End of string?
jne putstr_0 ; no, loop
return:
ret ; Return to caller
LF equ 10
CR equ 13
dataseg
mystrings Strings_s <"One string","Two strings","Three strings","Four strings">
end start
The first issue is that I need to terminate the strings I'm declaring in the struct, but adding ,CR,LF,0 is misinterpreted as additional struct members and TASM doesn't see \r\n\0 as escape sequences.
The second issue is that I'm trying to add the length Strings_s.s1 without hard coding 32 into my code. I first tried using the sizestr directive on the struct member, but even with version t300 defined before the ideal directive, TASM considers it an undefined symbol. So then I tried the example I included using the offset and struct cast, but it ends up being encoded as add si,0.
Ideas?
EDIT: The second issue turned out to be a simple error. You need to offset to the second member of the struct. (code fixed)
EDIT2: The sizestr directive only works against text macros and is really just a simple strlen of the text after the equ directive, so it isn't what I thought it was. I also tried slipping in a Strings_len EQU $-Strings_s between the s1 and s2 members, but it incorrectly equated to 23, not 32.

How to move the value in a larger-typed register into a smaller-typed destination?

For example, I want to move a DWORD value in a register into a memory location typed WORD, but am getting errors:
mov [arr + eax*TYPE arr], edx ; error: operands must be same size
the [] brackets dereference to an array element of type WORD.
I've tried doing this as well:
mov dx, edx ; error: operands must be same size.
mov [arr + eax*TYPE arr], dx
Also no luck trying to use PTR:
mov dx, WORD PTR edx ; error: invalid use of register
OR
mov WORD PTR [arr + eax*TYPE arr], edx ; error: invalid use of register
OR
mov [arr + eax*TYPE arr], WORD PTR edx ; error: invalid use of register
Solution? Thanks for any help!
The register DX is actually the lowest 16 bit of the 32 bit register EDX. You don't need to mov dx, edx because DX is already there. So you simply need to store DX in the word sized variable:
mov [word_variable], dx
Of course the highest 16 bit of edx will be lost in such a transfer.

cannot jump into arduino boot loader

I want to jump from my application to the bootloader ( I load via bluetooth and have an application command to jump to the boot loader).
the following work :
void* bl = (void *) 0x3c00;
goto *bl;
or
asm volatile { jmp BOOTL ::}
asm volatile { .org 0x3c00
BOOTL: }
(but code size grows to 0x3c00)
BUT, the most obvious option
asm volatile { jmp 0x3c00 ::}
does not (seems it does not even produce code }
Any idea why ?
The question as stated is not clear, as to what is working and what is failing. And about your environment, which is important. That said I guess your stating the void and/or "jmp BOOTL" work as desired, but makes the code appear to be huge.
I tried it on Arduino IDE 1.0.5 and only saw less than a 1/2K of code. Note 16K or Huge.
void* bl = (void *) 0x3c00;
void setup()
{
// put your setup code here, to run once:
}
void loop()
{
goto *bl;
// put your main code here, to run repeatedly:
}
with a compile output of...
Binary sketch size: 474 bytes (of a 32,256 byte maximum)
Estimated used SRAM memory: 11 bytes (of a 2048 byte maximum)
I suspect your observation is that the linker is seeing the pointer out at 0x3C00 the location of the BOOTSECTOR (noting it is at end of code) So it only looks like it is huge. I suspect there is a lot of white space between you may want to use the "avr-objdump.exe -d output.elf" to see what it is actually doing, versus what you expect.
0x3C00 is a 16-bit word address.
Use 0x7800 in GCC if you are using goto. GCC uses byte address (0x3C00 * 2 = 0x7800).
Example:
void *bl = (void *) 0x7800;
goto *bl;
will create the following assembly language (see *.lss output file):
c4: 0c 94 00 3c jmp 0x7800 ; 0x7800 <__stack+0x6d01>
#define GO_TO_ADRR_FLASH_MEMORY_BOOT_LOADER asm volatile ("JMP 0x7800")
GO_TO_ADRR_FLASH_MEMORY_BOOT_LOADER;

Resources