ltrace does not show sin() in the output - ltrace

I wanted to list the functions used in my application program using ltrace. It works but does not list "sin()" in the output.
#include<stdio.h>
#include<math.h>
int main()
{
float x=0;
printf("Hello World!!\n");
x=sin(2);
printf("sin(2)=%f\n",x);
return 0;
}
Output:
[kashi#localhost TestPrgms]$ gcc -o ltrace_test ltrace_test.c -lm
[kashi#localhost TestPrgms]$ ltrace ./ltrace_test
__libc_start_main(0x80484e0, 1, 0xbfddbfd4, 0x8048530 <unfinished ...>
puts("Hello World!!"Hello World!!
) = 14
printf("sin(2)=%f\n", 0.909297sin(2)=0.909297
) = 16
+++ exited (status 0) +++

This is because your sin call is a constant value and gcc optimizes it out (even when compiling with -O0 and without -lm). This is the result of running disass main in gdb:
0x0000000000400580 <+0>: push %rbp
0x0000000000400581 <+1>: mov %rsp,%rbp
0x0000000000400584 <+4>: sub $0x10,%rsp
0x0000000000400588 <+8>: mov 0xee(%rip),%eax # 0x40067c
0x000000000040058e <+14>: mov %eax,-0x4(%rbp)
0x0000000000400591 <+17>: mov $0x400660,%edi
0x0000000000400596 <+22>: callq 0x400450 <puts#plt>
0x000000000040059b <+27>: mov 0xdf(%rip),%eax # 0x400680
0x00000000004005a1 <+33>: mov %eax,-0x4(%rbp)
0x00000000004005a4 <+36>: movss -0x4(%rbp),%xmm0
0x00000000004005a9 <+41>: cvtps2pd %xmm0,%xmm0
0x00000000004005ac <+44>: mov $0x40066e,%edi
0x00000000004005b1 <+49>: mov $0x1,%eax
0x00000000004005b6 <+54>: callq 0x400460 <printf#plt>
0x00000000004005bb <+59>: mov $0x0,%eax
0x00000000004005c0 <+64>: leaveq
0x00000000004005c1 <+65>: retq
There is no call for sin here.
Changing your code to read:
#include<stdio.h>
#include<math.h>
int main()
{
float x, y;
scanf("%f", &x);
y=sin(x);
printf("sin(%f)=%f\n", x, y);
return 0;
}
will make you need -lm when compiling:
$ gcc -Wall -Wextra -O0 -g 1.c -lm
and now you'll see this disassembled output:
...
0x00000000004006c9 <+25>: callq 0x4005b0 <__isoc99_scanf#plt>
0x00000000004006ce <+30>: movss -0x8(%rbp),%xmm0
0x00000000004006d3 <+35>: unpcklps %xmm0,%xmm0
0x00000000004006d6 <+38>: cvtps2pd %xmm0,%xmm0
0x00000000004006d9 <+41>: callq 0x4005a0 <sin#plt>
...
and the call in ltrace:
__libc_start_main(0x4006b0, 1, 0x7fffd25ecff8, 0x400720 <unfinished ...>
__isoc99_scanf(0x4007b0, 0x7fffd25ecf08, 0x7fffd25ed008, 0x400720) = 1
sin(0x7fffd25ec920, 0x7fa1a6388a20, 1, 16) = 0x7fa1a643b780
printf("sin(%f)=%f\n", 3.000000, 0.141120sin(3.000000) =0.141120
) = 23
+++ exited (status 0) +++

Related

sb-ext:defglobal access disassembly different between REPL and .lisp file

I'm trying to make a global variable by using sb-ext:defglobal (actually by using quicklisp library global-vars which wraps it and several other implementation-specific mechanics, but it does not matter). I'm having the following code in test.lisp file:
(sb-ext:defglobal *test* 42)
(defun test () *test*)
Then I call (compile-file "test.lisp") and (load "test"). After that, the test function disassembly looks like the following:
(disassemble #'test)
; disassembly for TEST
; Size: 27 bytes. Origin: #x52C519CC ; TEST
; CC: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; D0: 488945F8 MOV [RBP-8], RAX
; D4: 488B05C5FFFFFF MOV RAX, [RIP-59] ; '*TEST*
; DB: 488B5001 MOV RDX, [RAX+1]
; DF: 488BE5 MOV RSP, RBP
; E2: F8 CLC
; E3: 5D POP RBP
; E4: C3 RET
; E5: CC10 INT3 16 ; Invalid argument count trap
But when I define the very same function in REPL and try to disassemble it, it looks like this:
(defun test2 () *test*)
(disassemble #'test2)
; disassembly for TEST2
; Size: 24 bytes. Origin: #x52C51A5C ; TEST2
; 5C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 60: 488945F8 MOV [RBP-8], RAX
; 64: 488B142590EA3F50 MOV RDX, [#x503FEA90] ; *TEST*
; 6C: 488BE5 MOV RSP, RBP
; 6F: F8 CLC
; 70: 5D POP RBP
; 71: C3 RET
; 72: CC10 INT3 16 ; Invalid argument count trap
It is clear that when compiled from REPL, the code accessing the global variable is a bit more effective: it compiles to just one indirect MOV from some static address, while being compiled from the file, the access boils down to two indirect MOVs.
Supposedly I'm going to be using those global variables in some tight loops, so I'm concerned even about small scraps of effectiveness. So my question is, why the disassembly for the same function differs between REPL and .lisp file? Am I missing something obvious here?

Problem with recursive factorial in x86_64 Assembly

I'm new to this assembly language and I tried to do the following code on my own. The problem is that my code cannot calculate the factorial of a number correctly and it always shows 1 as an output in the terminal. I'd like to know the reason why it is not working.
.text
mystring1: .asciz "Assignment 4: recursion\nType any number to calculate the factorial of that number:\n" # string for printing message
formatstr: .asciz "%ld" # format string for printing number
mystring2: .asciz "\n" # string for printing a new line
.global main # make the main label visible
main:
pushq %rbp # store the caller's base pointer
movq %rsp, %rbp # initialise the base pointer
movq $0, %rax # no vector registers in use for printf
movq $mystring1, %rdi # load address of a string
call printf # call the printf subroutine
call inout # call the inout subroutine
movq $0, %rax # no vector registers in use for printf
movq $mystring2, %rdi # load address of a string
call printf
jmp end
inout:
pushq %rbp # push the base pointer
movq %rsp, %rbp # copy the stack pointer to rbp
subq $16, %rsp # reserve stack space for variable
leaq -8(%rbp), %rsi # load address of stack variable in rsi
movq $formatstr, %rdi # load first argument of scanf
movq $0, %rax # no vector registers in use for scanf
call scanf # call scanf routine
movq -8(%rbp), %rsi # move the address of the variable to rsi
call factorial
movq $0, %rax # no vector registers in use for printf
movq $formatstr, %rdi # move the address formatstring to rdi
call printf # print the result
movq %rbp, %rsp # copy rbp to rsp
popq %rbp # pop rbp from the stack
ret # return from the subroutine
factorial:
cmpq $1, %rsi
jle factend
pushq %rbx
movq %rsi, %rbx
subq $1, %rsi
call factorial
mulq %rbx
popq %rbx
ret
factend:
movq $1, %rax
ret
end:
mov $0, %rdi # load program exit code
call exit # exit the program
The pseudocode of my code:
long rfact(long n)
{
long result;
if (n < = 1)
{
result = 1;
}
else
{
result = n * rfact(n - 1);
return result;
}
}
You're returning the result of your factorial in rax, but your caller is assuming that it is in rsi. The caller should move the result from rax to where it is needed (rsi in this case) right after the call to factorial returns.

What is the purpose of the LEA instructions in this function and what does the overall recursion do?

I have been trying to work out what the following recursive function does:
func4:
0x08048cfa <+0>: push edi
0x08048cfb <+1>: push esi
0x08048cfc <+2>: push ebx
0x08048cfd <+3>: mov ebx,DWORD PTR [esp+0x10] // First arg
0x08048d01 <+7>: mov edi,DWORD PTR [esp+0x14] // Second arg
0x08048d05 <+11>: test ebx,ebx // if (ebx == 0) { eax = 0; return ???;}
0x08048d07 <+13>: jle 0x8048d34 <func4+58>
0x08048d09 <+15>: mov eax,edi
0x08048d0b <+17>: cmp ebx,0x1 // if (ebx == 1) {return ???;}
0x08048d0e <+20>: je 0x8048d39 <func4+63>
0x08048d10 <+22>: sub esp,0x8
0x08048d13 <+25>: push edi
0x08048d14 <+26>: lea eax,[ebx-0x1]// eax = ebx-1
0x08048d17 <+29>: push eax
0x08048d18 <+30>: call 0x8048cfa <func4>
0x08048d1d <+35>: add esp,0x8 // esp += 8
0x08048d20 <+38>: lea esi,[edi+eax*1] // esi = edi + eax
0x08048d23 <+41>: push edi
0x08048d24 <+42>: sub ebx,0x2 // ebx -= 2
0x08048d27 <+45>: push ebx
0x08048d28 <+46>: call 0x8048cfa <func4>
0x08048d2d <+51>: add esp,0x10 // esp += 10
0x08048d30 <+54>: add eax,esi // eax += esi
0x08048d32 <+56>: jmp 0x8048d39 <func4+63>
0x08048d34 <+58>: mov eax,0x0 // eax = 0
0x08048d39 <+63>: pop ebx
0x08048d3a <+64>: pop esi
0x08048d3b <+65>: pop edi
0x08048d3c <+66>: ret
To date, I have figured out that it takes ebx, decrements it by one, passes it back to itself and recurses until it hits one of the base cases, then moves on to the next step of the recursion. However, I haven't fully understood what that branch of the recursion does, or what esp is doing in this context.
Any hints as to how to proceed? I have already stepped through it quite a few times with gdb, but have not really noticed any sort of pattern that would help me determine what is happening.
It seems that you don't know that the result is returned in the eax register. With that in mind the code is not difficult to understand. Assuming that the cdecl calling convention is used (because the stack is cleaned up by caller), it is same as this js function:
function func4(a, b)
{
if (a <= 0) return 0;
if (a == 1) return b;
return b + func4(a-1, b) + func4(a-2, b);
}
and is the asm code with comments
func4:
0x08048cfa <+0>: push edi ; save non-volatile registers
0x08048cfb <+1>: push esi
0x08048cfc <+2>: push ebx
0x08048cfd <+3>: mov ebx, [esp+0x10] ; ebx <- a
0x08048d01 <+7>: mov edi, [esp+0x14] ; edi <- b
0x08048d05 <+11>: test ebx, ebx ; if (a <= 0)
0x08048d07 <+13>: jle 0x8048d34 ; return 0
0x08048d09 <+15>: mov eax, edi ; result <- 0
0x08048d0b <+17>: cmp ebx, 0x1 ; if (a == 1)
0x08048d0e <+20>: je 0x8048d39 ; return result;
0x08048d10 <+22>: sub esp, 0x8 ; this is useless
0x08048d13 <+25>: push edi ; passing 2nd arguments
0x08048d14 <+26>: lea eax, [ebx-0x1] ;
0x08048d17 <+29>: push eax ; passing 1st arguments
0x08048d18 <+30>: call 0x8048cfa<func4> ; ax = func4(a - 1, b)
0x08048d1d <+35>: add esp, 0x8 ; clean up the stak after calling
0x08048d20 <+38>: lea esi, [edi+eax*1] ; temp = b + func4(a - 1, b)
0x08048d23 <+41>: push edi ; passing 2nd arguments
0x08048d24 <+42>: sub ebx, 0x2 ;
0x08048d27 <+45>: push ebx ; passing 1st arguments
0x08048d28 <+46>: call 0x8048cfa<func4> ; ax = func4(a - 2, b)
0x08048d2d <+51>: add esp, 0x10 ; clean up the stak and the useless 8 bytes
0x08048d30 <+54>: add eax, esi ; result = func4(a - 2, b) + temp
0x08048d32 <+56>: jmp 0x8048d39 ;
0x08048d34 <+58>: mov eax, 0x0 ; jump to here when a <= 0
0x08048d39 <+63>: pop ebx
0x08048d3a <+64>: pop esi
0x08048d3b <+65>: pop edi
0x08048d3c <+66>: ret
LEA is meant for calculating memory offsets, but it is widely used to doing fused multiplication and addition because it is quick and convenient. Two more advantages are: 1) you can assign the result to a register different from the source registers; 2) it doesn't affect the flags.

pass by reference in assembly

I am trying to write a program to calculate the exponential of a number using ARM-C inter-working. I am using LPC1769(cortex m3) for debuuging. The following is the code:
/*here is the main.c file*/
#include<stdio.h>
#include<stdlib.h>
extern int Start (void);
extern int Exponentiatecore(int *m,int *n);
void print(int i);
int Exponentiate(int *m,int *n);
int main()
{
Start();
return 0;
}
int Exponentiate(int *m,int *n)
{
if (*n==0)
return 1;
else
{
int result;
result=Exponentiatecore(m,n);
return (result);
}
}
void print(int i)
{
printf("value=%d\n",i);
}
this is the assembly code which complements the above C code
.syntax unified
.cpu cortex-m3
.thumb
.align
.global Start
.global Exponentiatecore
.thumb
.thumb_func
Start:
mov r10,lr
ldr r0,=label1
ldr r1,=label2
bl Exponentiate
bl print
mov lr,r10
mov pc,lr
Exponentiatecore: // r0-&m, r1-&n
mov r9,lr
ldr r4,[r0]
ldr r2,[r1]
loop:
mul r4,r4
sub r2,#1
bne loop
mov r0,r4
mov lr,r9
mov pc,lr
label1:
.word 0x02
label2:
.word 0x03
however during the debug session, I encounter a Hardfault error for the execution of "Exponentiatecore(m,n)".
as seen in debug window.
Name : HardFault_Handler
Details:{void (void)} 0x21c <HardFault_Handler>
Default:{void (void)} 0x21c <HardFault_Handler>
Decimal:<error reading variable>
Hex:<error reading variable>
Binary:<error reading variable>
Octal:<error reading variable>
Am I making some stack corruption during alignment or is there a mistake in my interpretation?
please kindly help.
thankyou in advance
There are several problems with your code. The first is that you have an infinite loop because your SUB instruction is not setting the flags. Change it to SUBS. The next problem is that you're manipulating the LR register unnecessarily. You don't call other functions from Exponentiatecore, so don't touch LR. The last instruction of the function should be "BX LR" to return to the caller. Problem #3 is that your multiply instruction is wrong. Besides taking 3 parameters, if you multiplied the number by itself, it would grow too quickly. For example:
ExponentiateCore(10, 4);
Values through each loop:
R4 = 10, n = 4
R4 = 100, n = 3
R4 = 10000, n = 2
R4 = 100,000,000 n = 1
Problem #4 is that you're changing a non-volatile register (R4). Unless you save/restore them, you're only allowed to trash R0-R3. Try this instead:
Start:
stmfd sp!,{lr}
ldr r0,=label1
ldr r1,=label2
bl Exponentiatecore // no need to call C again
bl print
ldmfd sp!,{pc}
Exponentiatecore: // r0-&m, r1-&n
ldr r0,[r0]
mov r2,r0
ldr r1,[r1]
cmp r1,#0 // special case for exponent value of 0
moveq r0,#1
moveq pc,lr // early exit
loop:
mul r0,r0,r2 // multiply the original value by itself n times
subs r1,r1,#1
bne loop
bx lr
I just add
Start:
push {r4-r11,lr}
...
pop {r4-r11,pc}
Exponentiatecore: # r0-&m, r1-&n
push {r4-r11,lr}
...
pop {r4-r11,pc}
and clean bl print in Start and all work fine

using SHLIB to compile and load standalone Rcpp function

I am trying to compile the following function with SHLIB (saved as foo.cpp):
#include <Rcpp.h>
RcppExport SEXP foo( SEXP x, SEXP y){
Rcpp::NumericVector xx(x), yy(y) ;
int n = xx.size() ;
Rcpp::NumericVector res( n ) ;
double x_ = 0.0, y_ = 0.0 ;
for( int i=0; i<n; i++){
x_ = xx[i] ;
y_ = yy[i] ;
if( x_ < y_ ){
res[i] = x_ * x_ ;
} else {
res[i] = -( y_ * y_) ;
}
}
return res ;
}
I try
$ R CMD SHLIB foo.cpp
/opt/local/bin/g++-mp-4.4 -I/opt/local/lib/R/include -I/opt/local/lib/R/include/x86_64 -I/opt/local/include -fPIC -pipe -O2 -m64 -c foo.cpp -o foo.o
foo.cpp:1:18: error: Rcpp.h: No such file or directory
foo.cpp:3: error: 'RcppExport' does not name a type
make: *** [foo.o] Error 1
How do I include this file, and is this the right way to compile a standalone function with Rcpp? Of course, I have installed Rcpp with install.packages('Rcpp').
Update:
Trying to find the location of Rcpp.h in R I get:
> system.file("lib", "Rcpp.h", package="Rcpp")
[1] ""
>
However,
> Rcpp:::LdFlags()
/opt/local/lib/R/library/Rcpp/lib/x86_64/libRcpp.a>
Update 2:
Looking at http://www.mail-archive.com/r-help#r-project.org/msg79185.html, I tried
$ PKG_CPPFLAGS=`Rscript -e 'Rcpp:::CxxFlags()'` \
> PKG_LIBS=`Rscript -e 'Rcpp:::LdFlags()'` \
> R CMD SHLIB foo.cpp
/opt/local/bin/g++-mp-4.4 -I/opt/local/lib/R/include -I/opt/local/lib/R/include/x86_64 -I/opt/local/lib/R/library/Rcpp/include -I/opt/local/include -fPIC -pipe -O2 -m64 -c foo.cpp -o foo.o
/opt/local/bin/g++-mp-4.4 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/opt/local/lib -o foo.so foo.o /opt/local/lib/R/library/Rcpp/lib/x86_64/libRcpp.a -L/opt/local/lib/R/lib/x86_64 -lR
and it generated foo.o and foo.so. How do I import this in R now?
Update 3:
So it can be loaded from dyn.load as
> dyn.load("foo.so")
> is.loaded("foo")
[1] TRUE
It can be called successfully as as
> .Call("foo",x=as.numeric(c(1,2,3)),y=as.numeric(c(4,5,6)))
[1] 1 4 9
Although the function is not visible as such.
> foo
Error: object 'foo' not found
Your question is clearly addressed in Question 2.4. of the Rcpp-FAQ.
The answer I found is that SHLIB needs to be provided the location of the Rcpp files. This can be done as
$ PKG_CPPFLAGS=`Rscript -e 'Rcpp:::CxxFlags()'` \
> PKG_LIBS=`Rscript -e 'Rcpp:::LdFlags()'` \
> R CMD SHLIB foo.cpp
Then, the compiled file can be loaded in R as
> dyn.load("foo.so")
and it can be called in R as
> .Call("foo",c(1,2,3),c(4,5,6))

Resources