What is the meaning of "T" symbol address = 0 in nm output - nm

I have 2 static libraries, that were compiled differently. At this point, I don't understand the difference.
I just want to understand, for the same symbol in one library, the address of the symbol is 0. And in the other differs from zero. What does it mean?
$ nm works/libdriver.a | grep mbedtls_cipher_setup
0000000000000487 T tls_cipher_setup
$ nm not_works/libdriver.a | grep mbedtls_cipher_setup
0000000000000000 T tls_cipher_setup
What will be the difference when the linker would use these libraries for linkage?
In my case, if I use works/libdriver.a library, the linkage passes successfully.
But when I use not_works/libdriver.a library, the same linkage generates multiple-definition errors, and the symbol tls_cipher_setup is one of many that generates this error.

Related

What are ppearson and spearson in datamash, and how do you use them?

When I look at the documentation, there is no "correlation", but there is "ppearson" and "spearson". They are mentioned exactly once, as a "group-by statistical operation." But .. how exactly are they defined?
Also, when I try to use one, there is an error message, but I don't understand how to fix it. How do you use ppearson or spearson?
$ cat > foo.tsv
1^I2
2^I3
$ cat foo.tsv | datamash ppearson 1,2
datamash: operation ‘ppearson’ requires field pairs
EDIT: This documentation section says
GNU Datamash is designed to closely follow R project’s (https://www.r-project.org/) statistical functions. See the files/operators.R file for the R equivalent code for each of datamash’s operators. When building datamash from source code on your local computer, operators are compared to known results of the equivalent R functions.
Looking in R, I don't see an spearson:
> ?spearson
No documentation for ‘spearson’ in specified packages and libraries:
you could try ‘??spearson’

API for ldd (or objdump)?

I need to programmatically inspect the library dependencies of a given executable. Is there a better way than running the ldd (or objdump) commands and parsing their output? Is there an API available which gives the same results as ldd ?
I need to programmatically inspect the library dependencies of a given executable.
I am going to assume that you are using an ELF system (probably Linux).
Dynamic library dependencies of an executable or a shared library are encoded as a table on Elf{32_,64}_Dyn entries in the PT_DYNAMIC segment of the library or executable. The ldd (indirectly, but that's an implementation detail) interprets these entries and then uses various details of system configuration and/or LD_LIBRARY_PATH environment variable to locate the needed libraries.
You can print the contents of PT_DYNAMIC with readelf -d a.out. For example:
$ readelf -d /bin/date
Dynamic section at offset 0x19df8 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x3000
0x000000000000000d (FINI) 0x12780
0x0000000000000019 (INIT_ARRAY) 0x1a250
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x1a258
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x308
0x0000000000000005 (STRTAB) 0xb38
0x0000000000000006 (SYMTAB) 0x358
0x000000000000000a (STRSZ) 946 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x1b000
0x0000000000000002 (PLTRELSZ) 1656 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x2118
0x0000000000000007 (RELA) 0x1008
0x0000000000000008 (RELASZ) 4368 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x000000006ffffffe (VERNEED) 0xf98
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0xeea
0x000000006ffffff9 (RELACOUNT) 170
0x0000000000000000 (NULL) 0x0
This tells you that the only library needed for this binary is libc.so.6 (the NEEDED entry).
If your real question is "what other libraries does this ELF binary require", then that is pretty easy to obtain: just look for DT_NEEDED entries in the dynamic symbol table. Doing this programmatically is rather easy:
Locate the table of program headers (the ELF file header .e_phoff tells you where it starts).
Iterate over them to find the one with PT_DYNAMIC .p_type.
That segment contains a set of fixed sized Elf{32,64}_Dyn records.
Iterate over them, looking for ones with .d_tag == DT_NEEDED.
Voila.
P.S. There is a bit of a complication: the strings, such as libc.so.6 are not part of the PT_DYNAMIC. But there is a pointer to where they are in the .d_tag == DT_STRTAB entry. See this answer for example code.

Name for GNU Make $(var:=suffix) syntax

Evidently, GNU Make supports the syntax $(var:=suffix), which does the same thing as $(addsuffix suffix,$(var)) as far as I can tell, except that suffix can contain , in the := version without the use of a variable.
What is this form of expansion called?
Evidently it operates on whitespace-delimited words, producing a new string without modifying the original variable.
This file
# Makefile
words=cat dog mouse triangle
$(info $(words:=.ext))
$(info $(words:=.ext))
all:
#true
produces the following when run:
$ make
cat.ext dog.ext mouse.ext triangle.ext
cat.ext dog.ext mouse.ext triangle.ext

Is it possible to uniquely identify dynamically imported functions by their name?

I used
readelf --dyn-sym my_elf_binary | grep FUNC | grep UND
to display the dynamically imported functions of my_elf_binary, from the dynamic symbol table in the .dynsym section to be precise. Example output would be:
[...]
3: 00000000 0 FUNC GLOBAL DEFAULT UND tcsetattr#GLIBC_2.0 (3)
4: 00000000 0 FUNC GLOBAL DEFAULT UND fileno#GLIBC_2.0 (3)
5: 00000000 0 FUNC GLOBAL DEFAULT UND isatty#GLIBC_2.0 (3)
6: 00000000 0 FUNC GLOBAL DEFAULT UND access#GLIBC_2.0 (3)
7: 00000000 0 FUNC GLOBAL DEFAULT UND open64#GLIBC_2.2 (4)
[...]
Is it safe to assume that the names associated to these symbols, e.g. the tcsetattr or access, are always unique? Or is it possible, or reasonable*), to have a dynamic symbol table (filtered for FUNC and UND) which contains two entries with the same associated string?
The reason I am asking is that I am looking for a unique identifier for dynamically imported functions ...
*) Wouldn't the dynamic linker resolve all "UND FUNC symbols" with the same name to the same function anyway?
Yes, given a symbol name and the set of libraries an executable is linked against, you can uniquely identify the function. This behavior is required for linking and dynamic linking to work.
An illustrative example
Consider the following two files:
librarytest1.c:
#include <stdio.h>
int testfunction(void)
{
printf("version 1");
return 0;
}
and librarytest2.c:
#include <stdio.h>
int testfunction(void)
{
printf("version 2");
return 0;
}
Both compiled into shared libraries:
% gcc -fPIC -shared -Wl,-soname,liblibrarytest.so.1 -o liblibrarytest.so.1.0.0 librarytest1.c -lc
% gcc -fPIC -shared -Wl,-soname,liblibrarytest.so.2 -o liblibrarytest.so.2.0.0 librarytest2.c -lc
Note that we cannot put both functions by the same name into a single shared library:
% gcc -fPIC -shared -Wl,-soname,liblibrarytest.so.0 -o liblibrarytest.so.0.0.0 librarytest1.c librarytest2.c -lc
/tmp/cctbsBxm.o: In function `testfunction':
librarytest2.c:(.text+0x0): multiple definition of `testfunction'
/tmp/ccQoaDxD.o:librarytest1.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
This shows that symbol names are unique within a shared library, but do not have to be among a set of shared libraries.
% readelf --dyn-syms liblibrarytest.so.1.0.0 | grep testfunction
12: 00000000000006d0 28 FUNC GLOBAL DEFAULT 10 testfunction
% readelf --dyn-syms liblibrarytest.so.2.0.0 | grep testfunction
12: 00000000000006d0 28 FUNC GLOBAL DEFAULT 10 testfunction
Now lets link our shared libraries with an executable. Consider linktest.c:
int testfunction(void);
int main()
{
testfunction();
return 0;
}
We can compile and link this against either shared library:
% gcc -o linktest1 liblibrarytest.so.1.0.0 linktest.c
% gcc -o linktest2 liblibrarytest.so.2.0.0 linktest.c
And run each of them (note I'm setting the dynamic library path so the dynamic linker can find the libraries, which are not in a standard library path):
% LD_LIBRARY_PATH=. ./linktest1
version 1%
% LD_LIBRARY_PATH=. ./linktest2
version 2%
Now lets link our executable to both libraries. Each is exporting the same symbol testfunction and each library has a different implementation of that function.
% gcc -o linktest0-1 liblibrarytest.so.1.0.0 liblibrarytest.so.2.0.0 linktest.c
% gcc -o linktest0-2 liblibrarytest.so.2.0.0 liblibrarytest.so.1.0.0 linktest.c
The only difference is the order the libraries are referenced to the compiler.
% LD_LIBRARY_PATH=. ./linktest0-1
version 1%
% LD_LIBRARY_PATH=. ./linktest0-2
version 2%
Here are the corresponding ldd output:
% LD_LIBRARY_PATH=. ldd ./linktest0-1
linux-vdso.so.1 (0x00007ffe193de000)
liblibrarytest.so.1 => ./liblibrarytest.so.1 (0x00002b8bc4b0c000)
liblibrarytest.so.2 => ./liblibrarytest.so.2 (0x00002b8bc4d0e000)
libc.so.6 => /lib64/libc.so.6 (0x00002b8bc4f10000)
/lib64/ld-linux-x86-64.so.2 (0x00002b8bc48e8000)
% LD_LIBRARY_PATH=. ldd ./linktest0-2
linux-vdso.so.1 (0x00007ffc65df0000)
liblibrarytest.so.2 => ./liblibrarytest.so.2 (0x00002b46055c8000)
liblibrarytest.so.1 => ./liblibrarytest.so.1 (0x00002b46057ca000)
libc.so.6 => /lib64/libc.so.6 (0x00002b46059cc000)
/lib64/ld-linux-x86-64.so.2 (0x00002b46053a4000)
Here we can see that while symbols are not unique, the way the linker resolves them is defined (it appears that it always resolves the first symbol it encounters). Note that this is a bit of a pathological case as you normally wouldn't do this. In the cases where you would go this direction there are better ways of handling symbol naming so they would be unique when exported (symbol versioning, etc)
In summary, yes, you can uniquely identify the function given its name. If there happens to be multiple symbols by that name, you identify the proper one using the order the libraries are resolved in (from ldd or objdump, etc). Yes, in this case you need a bit more information that just its name, but it is possible if you have the executable to inspect.
Note that in your case, the name of the first function import is not just tcsetattr but tcsetattr#GLIBC_2.0. The # is how the readelf program displays a versioned symbol import.
GLIBC_2.0 is a version tag that glibc uses to stay binary compatible with old binaries in the (unusual but possible) case that the binary interface to one of its functions needs to change. The original .o file produced by the compiler will just import tcsetattr with no version information but during static linking, the linker has noticed that the actual symbol exported by lic.so carries a GLIBC_2.0 tag, and so it creates a binary that insists on importing the particular tcsetattr symbol that has version GLIBC_2.0.
In the future there might be a libc.so that exports one tcsetattr#GLIBC_2.0 and a different tcsetattr#GLIBC_2.42, and the version tag will then be used to find which one a partcular ELF object refers to.
It is possible that the same process may also use tcsetattr#GLIBC_2.42 at the same time, such as if it uses another dynamic library which was linked against a libc.so new enough to provide it. The version tags ensure that both the old binary and the new library get the function they expect from the C library.
Most libraries don't use this mechanism and instead just rename the entire library if they need to make breaking changes to their binary interfaces. For example, if you dump /usr/bin/pngtopnm you'll find that the symbols it imports from libnetpbm and libpng are not versioned. (Or at least that's what I see on my machine).
The cost of this is that you can't have a binary that links against one version of libpng and also links against another library that itself links against a different libpng version; the exported names from the two libpng's would clash.
In most cases this is manageable enough through careful packaging practice that maintaining the library source to produce useful version tags and stay backwards compatible is not worth the trouble.
But in the particular case of the C library and a few other vital system libraries, changing the name of the library would be so extremely painful that it makes sense for the maintainers to jump through some hoops in order to ensure it will never need to happen again.
Although in most cases every symbol is unique, there are a handful of exceptions. My favorite is multiple identical symbol import used by PAM (pluggable authentication modules) and NSS (Name Service Switch). In both cases all modules written for either interface use a standard interface with standard names. A common and frequently used example is what happens when you call get host by name. The nss library will call the same function in multiple libraries to get an answer. A common configuration calles the same function in three libraries! I have seen the same function called in five different libraries from one function call, and that was not the limit just what was useful. There is special calls to the dynamic linker need to do this and I have not familiarised myself with the mechanics of doing this, but there is nothing special about the linking of the library module that is so loaded.

UNIX 'comm' utility allows for case insensitivity in BSD but not Linux (via -i flag). How can I get it in Linux?

I'm using the excellent UNIX 'comm' command line utility in an application which I developed on a BSD platform (OSX). When I deployed to my Linux production server I found out that sadly, Ubuntu Linux's 'comm' utility does not take the -i flag to indicate that the lines should be compared case-insensitive. Apparently the POSIX standard does not require the -i option.
So... I'm in a bind. I really need the -i option that works so well on BSD. I've gone so far to try to compile the BSD comm.c source code on the Linux box but I got:
http://svn.freebsd.org/viewvc/base/user/luigi/ipfw3-head/usr.bin/comm/comm.c?view=markup&pathrev=200559
me#host:~$ gcc comm.c
comm.c: In function ‘getline’:
comm.c:195: warning: assignment makes pointer from integer without a cast
comm.c: In function ‘wcsicoll’:
comm.c:264: warning: assignment makes pointer from integer without a cast
comm.c:270: warning: assignment makes pointer from integer without a cast
/tmp/ccrvPbfz.o: In function `getline':
comm.c:(.text+0x421): undefined reference to `reallocf'
/tmp/ccrvPbfz.o: In function `wcsicoll':
comm.c:(.text+0x691): undefined reference to `reallocf'
comm.c:(.text+0x6ef): undefined reference to `reallocf'
collect2: ld returned 1 exit status
Does anyone have any suggestions as to how to get a version of comm on Linux that supports 'comm -i'?
Thanks!
You can add the following in comm.c:
void *reallocf(void *ptr, size_t size)
{
void *ret = realloc(ptr, size);
if (ret == NULL) {
free(ptr);
}
return ret;
}
You should be able to compile it then. Make sure comm.c has #include <stdlib.h> in it (it probably does that already).
The reason your compilation fails is because BSD comm.c uses reallocf() which is not a standard C function. But it is easy to write.
#OP ,there's no need to go to such length as to do your own src code compilation . Here's an alternative suggestion. Since you want case insensitive, you can just convert the cases in both files to lower (or upper case) using another tool such as tr before you pass the files to comm.
tr '[A-Z]' '[a-z]' <file1 > temp1
tr '[A-Z]' '[a-z]' <file2 > temp2
comm temp1 temp2
You could try to cat both files and pipe them to uniq -c -i. It'll show all lines in both files, with the number of appearances in the first column. As long as the original files don't have repeated lines, all lines with the first column >1 are lines common to both files.
Hope it helps!
Does anyone have any suggestions as to how to get a version of comm on Linux that supports 'comm -i'?
Not quite that; but have you checked if your requirements could be satisfied by the join utility? This one does have the -i option on Linux...

Resources