I have the following piece of ARM Assembly Code from my professor.
I do not understand why i need to move the stack pointer to r1 and what happens exactly.
I know from the lecture, that ...
The stack pointer is pointing to the last written value on the stack.
Does pointing to always mean that the address is stored?
I managed to get the code working. But i want to improve code quality and understand what's going on. Also i am not allowed to use arithmetic operations anywhere in the program.
i tried the debugger also. but i only figured out how to watch the program counter from there.
I used
info registers sp pc
and
disas
I searched through all the options of the debugger but could not find something helpful.
In the stack pointer register is stored - i guessed - some address value.
// scan for users answer 'y'
ldr r0, =charplace
mov r1, sp # ???
bl scanf # Scan user's answer
ldr r1, =yes # Put address of 'y' in r1
ldrb r1, [r1] # Load the actual character 'y' into r1
at the start of the main function i do this:
.global main
main:
push {r4 - r7, lr} # copy values of these reg on top of the stack
sub sp, sp, #4 # needs to be replaced ! TODO
and at the end this:
end: add sp, sp, #4 # needs to be replaced ! TODO
pop {r4 - r7, pc} # copy values from the top of the stack back into these registers
The sub sp, sp, #4 allocates 4 bytes of space for a buffer. With a full descending stack, sp will point to the start of that buffer, with the other 3 bytes being at sp+1, sp+2, and sp+3 of course. The reason to move sp to r1 is that scanf needs the buffer address as second argument and r1 is used to pass that.
In gdb you can examine memory using the x command, to see the stack you can do for example x/4x $sp. See help x for format specifiers.
Related
For example, below is a piece of C code and its assembly code generated by cc compiler.
// C code (pre K&R C)
foo(a, b) {
int c, d;
c = a;
d = b;
return c+d;
}
// corresponding assembly code generated by cc
.global _foo
.text
_foo:
~~foo:
~a=4
~b=6
~c=177770
~d=177766
jsr r5, csv
sub $4, sp
mov 4(r5), -10(r5)
mov 6(r5), -12(r5)
mov -10(r5), r0
add -12(r5), r0
jbr L1
L1: jmp cret
I can understand most of the code. But I don't know what does ~~foo: do. And where do the magic numbers come from in ~c=177770 and ~d=177766. The hardware is pdp-11/40.
The tildes look like data which determines the stack usage. You might find it helpful to recall that the pdp-11 used 16-bit integers, and that DEC preferred octal numbers over hexadecimal.
That
jsr r5, csv
is a way of making register 5 (r5) point to some data (perhaps the list of offsets).
The numbers correspond to offsets on the stack in octal. The caller is assumed to do something like
push a and b onto the stack (positive offsets)
push the return address onto the stack (offset=0)
possibly push other stuff in the csv function
c and d are local variables (negative offsets, hence the "17777x")
That line
~d=177776
looks odd - I'd expect
~d=177766
since it should be below c on the stack. The -10 and -12 offsets in the register operands look like they're also octal numbers. You should be able to match up the offsets with the variables, by context.
That's just an educated guess: I adapted the jsr+r5 idiom a while back in a text-editor.
The lines with tildes are symbol definitions. A clue for that is in the DECUS C Compiler Reference, found at
ftp://ftp.update.uu.se/pub/pdp11/rsx/lang/decusc/2.19/005003/CC.DOC
which says
3.3 Global Symbols Containing Radix-50 '$' and '.'
______ _______ __________ ________ ___
With this version of Decus C, it is possible to generate and
access global symbols which contain the Radix-50 '.' and '$'.
The compiler allows identifiers to contain the Ascii '$', which
becomes a Radix-50 '$' in the object code. The AS assembly code
shows this character as a tilde (~). The underscore character
(_) in a C program becomes a '.' in both the AS assembly
language and in the object code. This allows C programs to
access all global symbols:
extern int $dsw;
. . .
printf("Directive status = %06o\n", $dsw);
The above prints the current contents of the task's directive
status word.
So you could read
~a=4
as
$a=4
and see that $a is a (more or less) conventional symbol.
Ex : Function Implementation:
facto(x){
if(x==1){
return 1;
}
else{
return x*facto(x-1);
}
in more simple way lets take a stack -->
returns
|2(1)|----> 2(1) evaluates to 2
|3(2)|----> 3(2)<______________| evaluates to 6
|4(3)|----> 4(6)<______________| evaluates to 24
|5(4)|----> 5*(24)<____________| evaluates to 120
------ finally back to main...
when a function returns in reverse manner it never knows what exactly is behind it? The stack have activation records stored inside it but how they know about each other who is popped and who is on top?
How the stack keeps track of all variables within the function being
executed? Besides this, how it keeps track of what code is executed
(stackpointer)? When returning from a function call the result of that
function will be filled in a placeholder. By using the stackpointer
but how it knows where to continue executing code? These are the
basics of how the stack works I know but for recursion I don't
understand how it exactly works??
When a function returns its stack frame is discarded (i.e the complete local state is pop-ed out of the stack).
The details depend on the processor architecture and language.
Check the C calling conventions for x86 processors: http://en.wikipedia.org/wiki/X86_calling_conventions, http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames and search for "PC Assembly Language" by Paul A. Carter (its a bit outdated but it has a good explanation of C and Pascal calling conventions for the ia32 architecture).
In C in x86 processors:
a. The calling function pushes the parameters of the called function to the stack in reverse order and then it pushes the pointer to the return address.
push -6
push 2
call add # pushes `end:` address an then jumps to `add:` (add(2, -6))
end:
# ...
b. Then the called function pushes the base of the stack (the ebp register in ia32) (it is used to reference local variables in the caller function).
add:
push ebp
c. The called function sets ebp to the current stack pointer (this ebp will be the reference to access the local variables and parameters of the current function instance).
add:
# ...
mov ebp, esp
d. The called function reserves space in the stack for the local (automatic) variables subtracting the size of the variables to the stack pointer.
add:
# ...
sub esp, 4 # Reserves 4 bytes for a variable
e. At the end of the called function it sets the stack pointer to be ebp (i.e frees its local variables), restores the ebp register of the caller function and returns to the return address (previously pushed by the caller).
add:
# ...
mov esp, ebp # frees local variables
pop ebp # restores old ebp
ret # pops `end:` and jumps there
f. Finally the caller adds to the stack pointer the space used by the parameters of the called function (i.e frees the space used by the arguments).
# ...
end:
add esp, 8
Return values (unless they are bigger than the register) are returned in the eax register.
i was reading an example in assembly languaje, and i have a little doubt. We were using assembly only on our programs, but the last unit on the semester it's to merge it with turbo c (in-line assembly), and reading the code, there's a part which i don't quite get it:
Here's the assembly part:
dosseg
.model small
.code
public _myputchar
_myputchar PROC
push bp
mov bp,sp
mov dl,[bp+4]
mov ah,2
int 21h
pop bp
ret
_myputchar ENDP
END
And here's the C part:
#include<stdio.h>
extern void myputchar( char x );
char *str={"Hola Mundo\n"};
void main ( void )
{
while(*str)
myputchar(*str++);
getchar();
}
So, it's pretty straight forward, and the program works, but, what i don't get, it's the assembly code. The problem is, Why the base pointer (bp) it's pointing to +4? (mov dl,[bp+4]), I would think that you only had to mov dl,bp but i don't get why +4. If someone can help we, that would be really apretiated!. (in the include section i put the "" Because the formating tools it's giving me such headech -_-!
The argument (x) is pushed onto the stack before calling the function. After this, the call instruction will push the return address (2 bytes in this case) onto the stack, and the push bp at the beginning of the function will push another 2 bytes onto the stack.
So by now you've pushed 2+2 == 4 more bytes onto the stack after the argument. Since the stack grows downward that means that to get the argument you have to offset the pointer by +4 bytes.
The starting address of the string you want to print is at [bp + 4]. The current stack pointer is [bp]. Remember, the stack grows down.
Not used memcpy much but here's my code that doesn't work.
memcpy((PVOID)(enginebase+0x74C9D),(void *)0xEB,2);
(enginebase+0x74C9D) is a pointer location to the address of the bytes that I want to patch.
(void *)0xEB is the op code for the kind of jmp that I want.
Only problem is that this crashes the instant that the line tries to run, I don't know what I'm doing wrong, any incite?
The argument (void*)0xEB is saying to copy memory from address 0xEB; presumably you want something more like
unsigned char x = 0xEB;
memcpy((void*)(enginebase+0x74c9d), (void*)&x, 2);
in order to properly copy the value 0xEB to the destination. BTW, is 2 the right value to copy a single byte to program memory? Looks like it should be 1, since you're copying 1 byte. I'm also under the assumption that you can't just do
((char*)enginebase)[0x74c9d] = 0xEB;
for some reason? (I don't have any experience overwriting program memory intentionally)
memcpy() expect two pointers for the source and destination buffers. Your second argument is not a pointer but rather the data itself (it is the opcode of jnz, as you described it). If I understand correctly what you are trying to do, you should set an array with the opcode as its contetns, and provide memcpy() with the pointer to that array.
The program crashes b/c you try to reference a memory location out of your assigned space (address 0xEB).
I think I know how to handle this case, but I just want to make sure I have it right. Say you have the following C code:
int myInt = 3;
int* myPointer = &myInt;
int** mySecondPointer = &myPointer;
P contains an address that points to a place in memory which has another address. I'd like to modify the second address. So the MIPS code:
la $t0, my_new_address
lw $t1, ($a0) # address that points to the address we want to modify
sw $t0, ($t1) # load address into memory pointed to by $t1
Is that the way you would do it?
Yes, that's correct as far as I can tell. It would have been easier if you used the same variable names (e.g. symbols instead of hard register names).
Why haven't you simply compiled the c-code and took a look at the list-file or assembly-output? I always do that when in doubt.