Counting number of bytes in shellcode - hex

I'm wondering how many bytes the following code (shellcode) is:
"\x31\xc0" /* Line 1: xorl %eax,%eax */
"\x50" /* Line 2: pushl %eax */
"\x68""//sh" /* Line 3: pushl $0x68732f2f */
"\x68""/bin" /* Line 4: pushl $0x6e69622f */
"\x89\xe3" /* Line 5: movl %esp,%ebx */
"\x50" /* Line 6: pushl %eax */
"\x53" /* Line 7: pushl %ebx */
"\x89\xe1" /* Line 8: movl %esp,%ecx */
"\x99" /* Line 9: cdq */
"\xb0\x0b" /* Line 10: movb $0x0b,%al */
"\xcd\x80" /* Line 11: int $0x80 */
I know that there are eight bits in a byte, so one hexadecimal pair is one byte. For example, \x31 is 0x31, which is one byte. But I'm unsure how how to count the //sh and /bin text on line 3 and line 4, respectively. Do I count those as single bytes as well? So would the total size of this shellcode be 18 bytes?

A char data type is 1 byte so the size for both strings is 4 bytes. You can confirm this using Python:
>>> len(b'/bin')
4
>>> (0x6e69622f).to_bytes(4, "little")
b'/bin'

Related

Assembly early return on a recursive function

This is more an academic exercise than anything else, but I'm looking to write a recursive function in assembly, that, if it receives and "interrupt signal" it returns to the main function, and not just the function that invoked it (which is usually the same recursive function).
For this test, I'm doing a basic countdown and printing one-character digits (8...7...6...etc.). To simulate an "interrupt", I am using the number 7, so when the function hits 7 (if it starts above that), it will return a 1 meaning it was interrupted, and if it wasn't interrupted, it'll countdown to zero. Here is what I have thus far:
.globl _start
_start:
# countdown(9);
mov $8, %rdi
call countdown
# return 0;
mov %eax, %edi
mov $60, %eax
syscall
print:
push %rbp
mov %rsp, %rbp
# write the value to a memory location
pushq %rdi # now 16-byte aligned
add $'0', -8(%rbp)
movb $'\n', -7(%rbp)
# do a write syscall
mov $1, %rax # linux syscall write
mov $1, %rdi # file descriptor: stdout=1
lea -8(%rbp), %rsi # memory location of string goes in rsi
mov $2, %rdx # length: 1 char + newline
syscall
# restore the stack
pop %rdi
pop %rbp
ret;
countdown:
# this is the handler to call the recursive function so it can
# pass the address to jump back to in an interrupt as one of the
# function parameters
# (%rsp) currntly holds the return address, and let's pass that as the second argument
mov %rdi, %rdi # redundant, but for clarity
mov (%rsp), %rsi # return address to jump
call countdown_recursive
countdown_recursive:
# bool countdown(int n: n<10, return_address)
# ...{
push %rbp
mov %rsp, %rbp
# if (num<0) ... return
cmp $0, %rdi
jz end
# imaginary interrupt on num=7
cmp $7, %rdi
jz fast_ret
# else...printf("%d\n", num);
push %rsi
push %rdi
call print
pop %rdi
pop %rsi
# --num
dec %rdi
# countdown(num)
call countdown_recursive
end:
# ...}
mov $0, %eax
mov %rbp, %rsp
pop %rbp
ret
fast_ret:
mov $1, %eax
jmp *%rsi
Does the above look like a valid approach, passing the memory address I want to go back to in rsi? The function was incredibly tricky for me to write, but I think mainly due to the fact that I'm pretty new/raw with assembly.
As well as returning to this alternate return address, you also need to restore the caller's (call-preserved) registers, not just the ones of your most recent parent. That includes RSP.
You're basically trying to re-invent C's setjmp / longjmp which does exactly this, including resetting the stack pointer back to the scope where you called setjmp. I think a few of the questions in SO's setjmp tag are about about implementing your own setjmp / longjmp in asm.
Also, to make this more efficient you might want to use a custom calling convention where the return address pointer (or a jmpbuf pointer after implementing the above) is in a call-preserved register like R15, so you don't have to save/restore it around print calls inside the body of your recursive function.

pointers in c translated to assembly

the code below as I understand it says store the pointer in %rsi in %eax if thats correct then the second line says add the pointer in %eax to the pointer in %rdi ?
very confused. I know assembly doesn't have pointers I am just speaking as translating assembly to c. I must write the assembly code into c code, and these two lines are killing me. Can I have clarification?
movl (%rsi), %eax
addl %eax, (%rdi)
Since you seem to be using using AT&T syntax, the parentheses dereference the value in %rsi. The C equivalent for these expressions would be:
/* Expression 1 */
unsigned int* p = some_address;
unsigned int i = *p; /* *p dereferences the address in p */
/* Expression 2 */
unsigned int* p = some_address;
unsigned int i = 8;
i += *p /* Increase i by the value pointed to by p */

In Ada, is there a way to make an enumeration type act like a modulus type -- to wrap to 0 after the last of it's range?

I am re-writing an encryption/compression library and it seems like it is getting to be a lot of processing per bytes processed. I would prefer to use an enumeration type when choosing which of several limited ways the encryption can go (the proper way), but when those paths become cyclical, I have to add extra code to test for type'last and type'first. I can always just write such a condition in for the type, or assign the addition/subtraction operator on the type a function to wrap around the result, but that is more code and processing that will add up quickly when it has to run every eight bytes along with everything else. Is there a way to make the operation about as efficient as if it were a simple "mod" type, like
type Modular is mod 64 ....;
for ......;
pragma ....;
type Frequency_Counter is array(Modular) of Long_Integer;
Head : Modular := (others => 0);
Freq : Frequency_Counter(Size) := (others => 0);
Encryption_Label : Modular := Hash3;
Block_Sample : Modular := Hash5;
...
Hash3 := Hash3 + 1;
Freq (Hash3):= Freq(Hash3) + 1; -- Here is where my made-on-the-fly example is focused
I think I can make the whole algorithm more efficient and use enumeration types if I can just get the enumeration type to do math in the processor in the same number of cycles as with a mod type math. I have gotten a little creative in thinking of a way, but they were too obviously not right for me to use any of them as an example. The only thing I can think might be possible exceeds my skill, and that is making a procedure using inline ASM (gas assembly language syntax) to make the operation very direct to the processor.
PS: I know this is a minor gain, alone. Any gain is appropriate for the application.
Not sure that it’ll make much difference!
Given this
package Cyclic is
type Enum is (A, B, C, D, E);
type Modular is mod 5;
function Next_Enum (En : Enum) return Enum is
(if En = Enum'Last then Enum'First else Enum'Succ (En)) --'
with Inline_Always;
end Cyclic;
and
with Cyclic; use Cyclic;
procedure Cyclic_Use (N : Natural; E : in out Enum; M : in out Modular) is
begin
begin
for J in 1 .. N loop
E := Next_Enum (E);
end loop;
end;
begin
for J in 1 .. N loop
M := M + 1;
end loop;
end;
end Cyclic_Use;
and compiling using GCC 5.2.0 with -O3 (gnatmake -O3 -c -u -f cyclic_use.adb -cargs -S), the x86_64 assembler generated for the two loops is
(enumeration)
L3:
leal 1(%rsi), %ecx
addl $1, %eax
cmpb $4, %sil
cmove %r8d, %ecx
cmpl %eax, %edi
movl %ecx, %esi
jne L3
(modular)
L4:
leal -4(%rdx), %ecx
addl $1, %eax
cmpb $3, %dl
leal 1(%rdx), %r8d
movl %ecx, %edx
cmovle %r8d, %edx
cmpl %eax, %edi
jne L4
I don’t pretend to know x86_64 assembler, and I don’t know why the enumeration version compares against 4 while the modular version compares against 3, but these look very similar to me! but the enumeration version is one instruction shorter ...

x86 pointers in commands

I'm new to x86.
I know what this kind of thing with the pointers means.
*command* %eax, %ebx
But how are these different, and what do they mean?
*command* %eax, (%ebx)
*command* (%eax), %ebx
*command* (%eax, %ebx, 4), %ecx
I think your question is, "what does the parentheses around a register's name mean/do?" At a high level, the parentheses say to perform a load from a memory address in and use that value in the instruction. Ie, whereas
*command* %eax, %ebx
operates on the values in the %eax and %ebx registers directly,
*command* (%eax), (%ebx)
loads the values from memory pointed to by %eax and %ebx and operates on them. There are actually a few more variants of the parentheses than you listed. For a description of them (including the last instruction example that you asked about), check here.
Hope that helps. Feel free to post back if you have any more questions.
Assume the following operations:
movl %eax, (%ebx) [1]
movl (%eax), %ebx [2]
movl (%eax, %ebx, 4), %ecx [3]
1, The first one will copy the value of eax into an address stored in ebx, smiler to this in C:
*(int *)ebx = eax; // copy eax into address
2, The second will copy the value stored in an address at eax into ebx:
ebx = *(int *)eax; // copy what in address into ebx
3, This is an array operation, where ebx is the index and 4 is the size of an element of the array.
ecx = ((int *) p)[ebx];
calculated as:
ecx = *(int *)((char *)p + ebx * sizeof(int));
In AT&T asm syntax, parenthesis mean "dereference" -- roughly the same as the unary * operator in C. So some rough equivalences:
movl %eax, %ebx eax = ebx
movl %eax, (%ebx) eax = *ebx
movl (%eax), %ebx *eax = ebx
That leqaves your last example:
movl (%eax, %ebx, 4), %ecx
In this case, there are multiple values that are combined to form the address to dereference. It's roughly equivalent to
*(eax + ebx*4) = ecx

boost mpi equivalent of status.MPI_SOURCE

is there a boost::MPI equivalent of the following C MPI code? I'm trying to port the following standard MPI code which is a basic master slave template found here. Following from the boost mpi documentation theres only 3 parameters , for an mpi_send or mpi_recv rank, tag and buffer.
while (work != NULL) {
/* Receive results from a slave */
MPI_Recv(&result, /* message buffer */
1, /* one data item */
MPI_INT, /* of type double real */
MPI_ANY_SOURCE, /* receive from any sender */
MPI_ANY_TAG, /* any type of message */
MPI_COMM_WORLD, /* default communicator */
&status); /* info about the received message */
/* Send the slave a new work unit */
MPI_Send(&work, /* message buffer */
1, /* one data item */
MPI_INT, /* data item is an integer */
status.MPI_SOURCE, /* to who we just received from */
WORKTAG, /* user chosen message tag */
MPI_COMM_WORLD); /* default communicator */
/* Get the next unit of work to be done */
work = get_next_work_item();
}
From the boost.MPI documentation:
MPI_ANY_SOURCE becomes any_source
MPI_ANY_TAG becomes any_tag
The communicator::recv() method returns an instance of the status class that provides all the information that you need:
status.MPI_SOURCE is returned by status::source()
status.MPI_TAG is returned by status::tag()
It also provides two cast operators to covert its content to MPI_Status structure.

Resources