Help deciphering a few lines of assembly - pointers

I have found these few lines of assembly in ollydbg:
MOV ECX,DWORD PTR DS:[xxxxxxxx] ; xxxxxxxx is an address
MOV EDX,DWORD PTR DS:[ECX]
MOV EAX,DWORD PTR DS:[EDX+116]
CALL EAX
Could someone step through and tell me what's happening here?

This is an invocation of a function pointer stored in a struct.
This first line obtains a pointer stored at address DS:xxxxxxxx. The square brackets indicate dereferencing of the address, much like * in C. The value from memory is about to be used as a pointer; it is placed into ecx register.
MOV ECX,DWORD PTR DS:[xxxxxxxx] ; xxxxxxxx is an address
The second line dereferences the pointer obtained above. That value from ecx is now used as the address, which is dereferenced. The value found in memory is another pointer. This second pointer is placed into the edx register.
MOV EDX,DWORD PTR DS:[ECX]
The third line again dereferences memory; this time, the access occurs to an address offset from the pointer obtained above by 0x116 bytes. This is not evenly divisible by four, so this function pointer does not appear to come from a C++ vtable. The value obtained from the memory is this time stored in register eax.
MOV EAX,DWORD PTR DS:[EDX+116]
Finally, the function pointed to by eax is executed. This simply invokes the function via a function pointer. The function appears to take zero arguments, but I have a question on revision of my answer: are there PUSH instruction which precede this snippet? Those would be the function arguments. The question marks indicate this function might return a value, we can't tell from our vantage.
CALL EAX
Overall, the code snippet looks like an invocation of an extension function from a plug-in library to OllyDbg. The OllyDbg ABI specifies various structs which contain some function pointers. There are also arrays of function pointers, but the double-indirection to get to the edx-held pointer (also the not-aligned-by-even-multiple offset) makes me think this is a struct and not an array of function pointers or a C++ class's vtable.
In other words, xxxxxxxx is a pointer to a pointer to a struct containing a function pointer.
In the OllyDbg source file PlugIn.h are some candidate struct definitions. Here's an example:
typedef struct t_sorted { // Descriptor of sorted table
char name[MAX_PATH]; // Name of table, as appears in error
int n; // Actual number of entries
int nmax; // Maximal number of entries
int selected; // Index of selected entry or -1
ulong seladdr; // Base address of selected entry
int itemsize; // Size of single entry
ulong version; // Unique version of table
void *data; // Entries, sorted by address
SORTFUNC *sortfunc; // Function which sorts data or NULL
DESTFUNC *destfunc; // Destructor function or NULL
int sort; // Sorting criterium (column)
int sorted; // Whether indexes are sorted
int *index; // Indexes, sorted by criterium
int suppresserr; // Suppress multiple overflow errors
} t_sorted;
Those examples are allowed to be NULL, and your asm snippet does not check for NULL pointer in the function pointer. Therefore, it would have to be DRAWFUNC from t_table or SPECFUNC of t_dump.
You could create a small project which includes the header file and uses printf() and offsetof() to determine whether either of those is at an offset of 0x116.
Otherwise, I imagine that the insides of OllyDbg are written in this same style. So there are likely to be private struct definitions (not published in the Plugin.h file) used for various purposes within OllyDbg.
I would like to add, I think it's a shame that OllyDbg sources are not available. I was under the impression that the statically-linked disassembler it contains was under some kind of ?GPL license, but I haven't had any luck getting the sources to OllyDbg.

Take the 32 bit number from the address xxxxxxx and put it in ECX register, then use this value as an address and read the value and put it in EDX register, finally add 116 to this number and read the value of that address into EAX. Then it starts executing the code at the address now held in EAX. When that code encounters a return opcode, execution will continue after the call instruction.
This is pretty basic assembly. It makes me wonder wtf you are doing with a debugger and when your assignment is due ;-)

It's been awhile since I did ASM (1997) and even then I was only doing i386 ASM so forgive me if my answer isn't all that helpful...
Unfortunately, these 4 lines of code don't tell me much. It's mostly just loading stuff into CPU registers and calling a function.
Specifically, It looks like data or perhaps a pointer is being loaded from that address into your CX register. Then that value is being copied from CX to DX. So you have the value of the pointer of CX located in DX. Then that value in DX plus an offset of 116 is being copied into the AX register (your accumulator?)
Then whatever function located at that address copied into AX is being executed.

I'm 99% sure it's a virtual method call, considering comments about compiler being MSVC.
MOV ECX,DWORD PTR DS:[xxxxxxxx]
Pointer to a class instance is loaded into ECX from a global variable. (NB: default __thiscall calling convention uses ECX to pass the instance pointer, aka the this pointer).
MOV EDX,DWORD PTR DS:[ECX]
vftable (virtual function table) pointer is usually the first item in the class layout. Here the pointer is loaded into EDX.
MOV EAX,DWORD PTR DS:[EDX+116]
A method pointer at offset 116 (0x74) in the table is loaded into EAX. Since each pointer is 4 bytes, this is the 30th virtual method of the class (116/4 + 1).
CALL EAX
The method is called.
In original C++ it would look something like this:
g_pObject1->method30();
To know more about MSVC's implementation of C++ classes, including virtual methods, see my article here.

Related

Return address x86 Assembly [duplicate]

This is what I see by disassemble for the statement function(1,2,3);:
movl $0x3,0x8(%esp)
movl $0x2,0x4(%esp)
movl $0x1,(%esp)
call 0x4012d0 <_Z8functioniii>
It seems the ret address is not pushed into stack at all,then how does ret work?
On an x86 processor (as for your assembly language example), the call instruction pushes the return address on the stack and transfers control to the function.
So on entry to a function, the stack pointer is pointing at a return address, ready for ret to pop it into the program counter (EIP / RIP).
Not all processor architectures put the return address on the stack- often there's a set of one or more registers designed to hold return addresses. On ARM processors, the BL instruction places the return address in a specific register (LR, or the 'link register') and transfers control to the function.
The ia64 processor does something similar, except that there are several possible registers (b0-b7) that can receive the return address and one will be specified in the instruction (with b0 being the default).
Ideally, the call statement should take care of that. The program counter's next location will be pushed into the stack. When the function (sub routine) that was called completes it work and when it encounters a return statement, the control now goes to the address that was pushed into the stack and it will get popped.
It depends on the ABI and the architecture, but if the return address does end up on the stack it's a side-effect of the call instruction that puts it there.
call pushes the current value of the RIP register (return address) to the stack + does the call
ret pops the return address(that call pushed) from the top of the stack (RSP register points there) and writes it in the RIP register.
Example on a GNU/Linux box: function f calls function g and lets look at the frame of g.
LOW ADDRESS
... <- RSP (stack pointer shows top of stack) register points at this address
g's local vars
f's base pointer (old RBP value) <- RBP (base pointer) register points at this address
f's ret address (old RIP value) (this is what the call (from f) pushed, and what the ret (from g) will pop)
args that f called g with and didn't fit in the registers (I think on Windows this is different)
...
HIGH ADDRESS
g will free the local vars (movq %rsp, %rbp)
g will pop the "old RBP" and store it in RBP register (pop %rbp)
g will ret, which will modify RIP with the value that is stored where RSP points at
Hope it helps

Does a function parameter that accepts a string reference point directly to the string variable or the data on the heap in Rust

I've taken this picture and code from The Rust Book.
Why does s point to s1 rather than just the data on the heap itself?
If so this is how it works? How does the s point to s1. Is it allocated memory with a ptr field that contains the memory address of s1. Then, does s1, in turn point to the data.
In s1, I appear to be looking at a variable with a pointer, length, and capacity. Is only the ptr field the actual pointer here?
This is my first systems level language, so I don't think comparisons to C/C++ will help me grok this. I think part of the problem is that I don't quite understand what exactly pointers are and how the OS allocates/deallocates memory.
fn main() {
let s1 = String::from("hello");
let len = calculate_length(&s1);
println!("The length of '{}' is {}.", s1, len);
}
fn calculate_length(s: &String) -> usize {
s.len()
}
The memory is just a huge array, which can be indexed by any offset (e.g. u64).
This offset is called address,
and a variable that stores an address called a pointer.
However, usually only some small part of memory is allocated, so not every address is meaningful (or valid).
Allocation is a request to make a (sequential) range of addresses meaningful to the program (so it can access/modify).
Every object (and by object I mean any type) is located in allocated memory (because non-allocated memory is meaningless to the program).
Reference is actually a pointer that is guaranteed (by a compiler) to be valid (i.e. derived from address of some object known to a compiler). Take a look at std doc also.
Here an example of these concepts (playground):
// This is, in real program, implicitly defined,
// but for the sake of example made explicit.
// If you want to play around with the example,
// don't forget to replace `usize::max_value()`
// with a smaller value.
let memory = [uninitialized::<u8>(); usize::max_value()];
// Every value of `usize` type is valid address.
const SOME_ADDR: usize = 1234usize;
// Any address can be safely binded to a pointer,
// which *may* point to both valid and invalid memory.
let ptr: *const u8 = transmute(SOME_ADDR);
// You find an offset in our memory knowing an address
let other_ptr: *const u8 = memory.as_ptr().add(SOME_ADDR);
// Oversimplified allocation, in real-life OS gives a block of memory.
unsafe { *other_ptr = 15; }
// Now it's *meaningful* (i.e. there's no undefined behavior) to make a reference.
let refr: &u8 = unsafe { &*other_ptr };
I hope that clarify most things out, but let's cover the questions explicitly though.
Why does s point to s1 rather than just the data on the heap itself?
s is a reference (i.e. valid pointer), so it points to the address of s1. It might (and probably would) be optimized by a compiler for being the same piece of memory as s1, logically it still remains a different object that points to s1.
How does the s point to s1. Is it allocated memory with a ptr field that contains the memory address of s1.
The chain of "pointing" still persists, so calling s.len() internally converted to s.deref().len, and accessing some byte of the string array converted to s.deref().ptr.add(index).deref().
There are 3 blocks of memory that are displayed on the picture: &s, &s1, s1.ptr are different (unless optimized) memory addresses. And all of them are stored in the allocated memory. The first two are actually stored at pre-allocated (i.e. before calling main function) memory called stack and usually it is not called an allocated memory (the practice I ignored in this answer though). The s1.ptr pointer, in contrast, points to the memory that was allocated explicitly by a user program (i.e. after entering main).
In s1, I appear to be looking at a variable with a pointer, length, and capacity. Is only the ptr field the actual pointer here?
Yes, exactly. Length and capacity are just common unsigned integers.

Pointer to a register on a 16 bit controller

How do you declare a pointer on a 16 bit Renesas RL78 microcontroller using IAR's EWB RL78 compiler to a register which has a 20 bit address?
Ex:
static int *ptr = (int *)0xF1000;
The above does not work because pointers are 16 bit addresses.
If the register in question is an on-chip peripheral, then it is likely that your toolchain already includes a processor header with all registers declared, in which case you should use that. If for some reason you cannot or do not wish to do that, then you could at least look at that to see how it declares such registers.
In any event you should at least declare the address volatile since it is not a regular memory location and may change beyond the control and knowledge of your code as part of the normal peripheral behaviour. Moreover you should use explicit sized data types and it is unlikely that this register is signed.
#include <stdint.h>
...
static volatile uint16_t* ptr = (uint16_t*)0xF1000u ;
Added following clarification of target architecture:
The IAR RL78 compiler supports two data models - near and far. From the IAR compiler manual:
● The Near data model can access data in the highest 64 Kbytes of data
memory
● The Far data model can address data in the entire 1 Mbytes of
data memory.
The near model is the default. The far model may be set using the compiler option: --data_model=far; this will globally change the pointer type to allow 20 bit addressing (pointers are 3 bytes long in this case).
Even without specifying the data model globally it is possible to override the default pointer type by explicitly specifying the pointer type using the keywords __near and __far. So in the example in the question the correct declaration would be:
static volatile uint16_t __far* ptr = (uint16_t*)0xF1000u ;
Note the position of the __far keyword is critical. Its position can be used to declare a pointer to far memory, or a pointer in far memory (or you can even declare both to and in far memory).
On an RL78, 0xF1000 in fact refers to the start of data flash rather then a register as stated in the question. Typically a pointer to a register would not be subject to alteration (which would mean it referred to a different register), so might reasonably be declared const:
static volatile uint16_t __far* const ptr = (uint16_t*)0xF1000u ;
Similarly to __far the position of const is critical to the semantics. The above prevents ptr from being modified but allows what ptr refers to to be modified. Being flash memory, this may not always be desirable or possible, so it is possible that it could reasonably be declared a const pointer to a const value.
Note that for RL78 Special Function Registers (SFR) the IAR compiler has a keyword __sfr specifically for addressing registers in the area 0xFFF00-0xFFFFF:
Example:
#pragma location=0xFFF20
__no_init volatile uint8_t __sfr PORT1; // PORT1 is located at address 0xFFF20
Alternative syntax using IAR specfic compiler extension:
__no_init volatile uint8_t __sfr PORT1 # 0xFFF20 ;

Can you use dereferencing to call a pointer to a function in assembly?

Exactly how it sounds.
I load the OFFSET of a procedure into a register, then try to call that register:
MOV EBX, OFFSET MyProc
CALL EBX
At first I would assume that this will call the function, however when you call a procedure you don't type CALL OFFSET MyProc, you simply type CALL MyProc.
In C you can call a pointer to a function with the * operator: (*MyProc)();.
Which leads me to wonder if dereferencing the pointer to the function would call the procedure.
CALL [EBX]
However if I dereference it, masm tells me that I need to specify a size, the only possible sizes that I am aware of that I could specify are DWORD PTR, WORD PTR, and BYTE PTR, and I don't think that a procedure is of a particular size.
To sum it up, can you call a pointer to a procedure simply by directly supplying the pointer as an operand to the call instruction, or would you have to dereference the pointer in the call instruction?
Thanks
Why not CALL OFFSET MyProc - because that would be annoying to type every time, and the inconsistent syntax didn't bother MASM creators much (consider the mov eax,var1 vs mov eax,[ebx], both dereferencing memory).
The call [ebx] would fetch the value stored at ebx address and use that as final target address, so in your case it would try to interpret the first instructions of procedure as target address, and jump who-know-where (probably causing illegal access crash from OS).
The required size in such case is not classic integer size, but jump/call addresses size, like NEAR PTR and FAR PTR, which affects how many bytes from memory will be used (NEAR PTR in 32b mode is 32b wide vs 16b in real mode (just offset part), FAR PTR is 32b in real mode (16b offset + 16b segment), and 48b in 32b protected mode (32b offset + 16b segment, which works more like selector or something, I never actually needed to fully understand this one, so consult your favourite x86 documentation/book for details).

x86 asm, dereferenced pointer not getting updated

Here's a test procedure from a program I'm working on, I pass in some parm's via the stack, one of which is a pointer. When I try to change the value of the dereferenced pointer, the variable isn't updated.
_testProc proc
push bp ;Save base pointer to stack
mov bp, sp ;Set new base pointer
sub sp, 4 ;Allocate stack space for locals
pusha ;Save registers to stack
mov di, [bp + 08] ;Parm 3 - ptr to variable
mov word ptr [di], 10 ; <---- Doesn't work. di contains an address,
; but what it points at doesn't get updated
popa ;Restore registers from stack
mov sp, bp ;Remove local vars by restoring sp
pop bp ;Restore base pointer from stack
ret 6 ;Return and also clean up parms on stack
_testProc endp
The 8086 produces and address by combining the contents of a segment register and an index register; I show that as [SR,IR].
Your update via register di is updating a location defined by [DS,DI]; mov instructions without any special prefix default to using the DS register. If you got the address DI as an offset for some other segment (ES? SS?) then you are in effect combining the wrong registers to hit the address you desire.
Your mistake is in not being clear about what the conventions are for passing a "pointer" to your routine. What you've define assume a relative offset from DS.
The very best thing you can do is to abandon 16-bit segmented code as soon as you can! :)
Failing that, there's "far data" and a "far pointer" to point to it. Your "proc" doesn't say if it's near or far - I assume near (or Parm3 probably isn't where you think it is on the stack... since the far return address is 4 bytes). If the variable you intend to alter is on the stack, you're in for some more complication. mov word ptr ss:[di], 10 at least. If you need to handle either a local or static variable, I think you're going to need a far pointer (4 bytes, segment and offset) to find it.
What first came to my mind is that you say you're trying to change the value of a dereferenced pointer, you don't "dereference" it (as I understand it). Try mov di, [di] after you get the value off the stack. Easy to try, anyway. :)
If all else fails, show us the calling code. (and get into 32-bit code as soon as you can!)

Resources