pointers in c translated to assembly - pointers

the code below as I understand it says store the pointer in %rsi in %eax if thats correct then the second line says add the pointer in %eax to the pointer in %rdi ?
very confused. I know assembly doesn't have pointers I am just speaking as translating assembly to c. I must write the assembly code into c code, and these two lines are killing me. Can I have clarification?
movl (%rsi), %eax
addl %eax, (%rdi)

Since you seem to be using using AT&T syntax, the parentheses dereference the value in %rsi. The C equivalent for these expressions would be:
/* Expression 1 */
unsigned int* p = some_address;
unsigned int i = *p; /* *p dereferences the address in p */
/* Expression 2 */
unsigned int* p = some_address;
unsigned int i = 8;
i += *p /* Increase i by the value pointed to by p */

Related

TASM struct initilization and pointer math issues

I am attempting to write a simple DOS test program in assembly using TASM v4.1 that walks through a structure that contains four strings of equal length, but I've hit two issues.
ideal
model small
stack 1024
struc Strings_s
s1 db 32 dup (?)
s2 db 32 dup (?)
s3 db 32 dup (?)
s4 db 32 dup (?)
ends Strings_s
codeseg
start:
mov ax, #data
mov ds, ax ; Set %DS to point to the data segment
mov cx, 4 ; Load loop count
mov si, offset mystrings.s1 ; Load seg offset of first string
start_1:
push si
call putstr ; Print asciiz string
pop si
;add si, offset (Strings_s ptr ds:0).s1 ; ***BROKEN***
add si, offset (Strings_s ptr ds:0).s2 ; FIXED
loop start_1 ; Loop
fin:
mov ax, 4C00h ; [DOS] terminate program
int 21h ; ...
putstr_0:
mov bx, 07h
mov ah, 0Eh ; [BIOS] Display character
int 10h ; ...
putstr:
lodsb ; Get next char from %SI
test al, al ; End of string?
jne putstr_0 ; no, loop
return:
ret ; Return to caller
LF equ 10
CR equ 13
dataseg
mystrings Strings_s <"One string","Two strings","Three strings","Four strings">
end start
The first issue is that I need to terminate the strings I'm declaring in the struct, but adding ,CR,LF,0 is misinterpreted as additional struct members and TASM doesn't see \r\n\0 as escape sequences.
The second issue is that I'm trying to add the length Strings_s.s1 without hard coding 32 into my code. I first tried using the sizestr directive on the struct member, but even with version t300 defined before the ideal directive, TASM considers it an undefined symbol. So then I tried the example I included using the offset and struct cast, but it ends up being encoded as add si,0.
Ideas?
EDIT: The second issue turned out to be a simple error. You need to offset to the second member of the struct. (code fixed)
EDIT2: The sizestr directive only works against text macros and is really just a simple strlen of the text after the equ directive, so it isn't what I thought it was. I also tried slipping in a Strings_len EQU $-Strings_s between the s1 and s2 members, but it incorrectly equated to 23, not 32.

What are the details of the Juniper floating-point scenario/bug wherein the sticky bit is lost?

On some Juniper MX routers, floats are not handled correctly: The sticky bit is lost if it is shifted more that 8 bits to the right (underflow) during a calculation. Is there a workaround for this? Are there any known impacts? Has it been fixed? Is this an IEEE acceptable option? Does the issue exist in other systems?
Example with Math Details (best viewed with fixed width font, and wide screen):
1
shifts: 12345678901
4095.05615204245304994401521980762481689453125000000000 = 0x1.ffe1cbff5e3e1p+11 = 0x40affe1cbff5e3e1 = 111111111111.00001110010111111111101011110001111100001
+ 1.0000137123424794882708965815254487097263336181640625 = 0x1.0000e60e10001p+0 = 0x3ff0000170168000 = 1.0000000000000000111001100000111000010000000000000001
^
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1000000000000.000011100110000011100001000000000000000010s
LGRS
0101
1 2 3 4 5
mantissa bit #: 1234567890123 4567890123456789012345678901234567890123
4096.0561657547959839575923979282379150390625 = 0x1.0000e60e10001p+12 = 0x40b0000e60e10001 = 1000000000000.0000111001100000111000010000000000000001 (on "all" systems/correct)
4096.056165754795074462890625 = 0x1.0000e60e10000p+12 = 0x40b0000e60e10000 = 1000000000000.0000111001100000111000010000000000000000 (on Juniper router)
^ ^ ^ ^
Internet source informs me that a number of MX-series routers utilize Intel x86 CPUs. The observed behavior is entirely consistent with the use of an x87 FPU for floating-point computation (as opposed to SSE or AVX), when that x87 is configured to operate in extended-precision mode.
The x87 FPU stores all operands in 80-bit registers, where each register holds a floating-point operand using 64 significand (mantissa) bits, and the integer bit of the significand is explicit. Bits 8 and 9 of the FPU control word represent the precision control field that indicate at which bit position the FPU will round results. A setting of 2 is equivalent to double precision while a setting of 3 means round to extended precision.
Most Unix-like 32-bit operating systems set the x87 rounding control to 3, while Windows set it to 2. I do not know whether modern Junos is a 32-bit or 64-bit OS. It may retain use of the x87 and FPU precision control setting of 3 for reasons of backwards compatibility.
With x87 precision control set to 3 (extended precision) there is a issue with double rounding. Results of floating-point operations are first rounded to extended precision and stored in an internal FPU register. Later, the data is taken from the register and rounded again when this result is stored out to a memory location corresponding to a double variable.
I programmed up the specific scenario from the question on Windows64 using the Intel compiler for easy access to x87 assembly language instructions. The program dumps the two source operands a and b and the sum r in three different formats (decimal floating point, hexadecimal floating point, and binary) and also dumps the internal 80-bit representation of these operands (with a t prefix).
By defining USE_X87_EXTENDED_PRECISION as either 0 or 1 the precision control of the FPU can be set to either double precision or extended precision prior to the computation, and the value of the relevant FPU control word is shown as compute cw. With USE_X87_EXTENDED_PRECISION set to 0, the output of the program is:
original cw=027f
compute cw=027f
a=4.0950561520424530e+003 0x1.ffe1cbff5e3e1p+11 40affe1cbff5e3e1 ta=400afff0e5ffaf1f0800
b=1.0000137123424795e+000 0x1.0000e60e10001p+0 3ff0000e60e10001 tb=3fff8000730708000800
r=4.0960561657547960e+003 0x1.0000e60e10001p+12 40b0000e60e10001 tr=400b8000730708000800
However, when USE_X87_EXTENDED_PRECISION is 1, the result is:
original cw=027f
compute cw=037f
a=4.0950561520424530e+003 0x1.ffe1cbff5e3e1p+11 40affe1cbff5e3e1 ta=400afff0e5ffaf1f0800
b=1.0000137123424795e+000 0x1.0000e60e10001p+0 3ff0000e60e10001 tb=3fff8000730708000800
r=4.0960561657547951e+003 0x1.0000e60e10000p+12 40b0000e60e10000 tr=400b8000730708000400
During the second rounding, from tr to r, the round bit is 1, but the sticky bit is 0 as all trailing significand bits past the round bit are zero, so the "even" part of the default rounding mode "round to nearest or even" kicks in.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#define USE_X87_EXTENDED_PRECISION (1)
typedef struct tbyte {
uint64_t l;
uint16_t h;
} tbyte;
uint64_t double_as_uint64 (double a)
{
uint64_t r; memcpy (&r, &a, sizeof r); return r;
}
int main (void)
{
double a = 0x1.ffe1cbff5e3e1p+11;
double b = 0x1.0000e60e10001p+0;
double r;
uint16_t cw_orig, cw_comp, cw_temp;
tbyte ta, tb, tr;
__asm fstcw word ptr [cw_orig];
#if USE_X87_EXTENDED_PRECISION
cw_temp = cw_orig | (3 << 8);
__asm fldcw word ptr [cw_temp];
#endif // USE_X87_EXTENDED_PRECISION
__asm fstcw word ptr [cw_comp];
__asm fld qword ptr [a];
__asm fld qword ptr [b];
__asm fld st(1);
__asm fadd st, st(1);
__asm fst qword ptr [r];
__asm fstp tbyte ptr [tr];
__asm fstp tbyte ptr [tb];
__asm fstp tbyte ptr [ta];
__asm fldcw word ptr [cw_orig];
printf ("original cw=%04x\n", cw_orig);
printf ("compute cw=%04x\n", cw_comp);
printf ("a=%23.16e %21.13a %016llx ta=%04x%016llx\n", a, a, double_as_uint64 (a), ta.h, ta.l);
printf ("b=%23.16e %21.13a %016llx tb=%04x%016llx\n", b, b, double_as_uint64 (b), tb.h, tb.l);
printf ("r=%23.16e %21.13a %016llx tr=%04x%016llx\n", r, r, double_as_uint64 (r), tr.h, tr.l);
return EXIT_SUCCESS;
}

Accessing id of MPI_Datatype

I am trying to use PMPI wrapper to record some function parameters, e.g. MPI_Send's parameter. I need to record them and then I could use them to reconstruct content of all those parameters.
The wrapper for MPI_Send looks like this:
/* ================== C Wrappers for MPI_Send ================== */
_EXTERN_C_ int PMPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);
_EXTERN_C_ int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) {
int _wrap_py_return_val = 0;
do_wrap_send_series((char *)"MPI_Send", buf, count, datatype, dest, tag, comm);
_wrap_py_return_val = PMPI_Send(buf, count, datatype, dest, tag, comm);
return _wrap_py_return_val;
}
The problem is that I couldn't record pointer's value and use it later on. Pointer could differ across runs.
At least MPI_Datatype is pointer type, correct me if I am wrong.
How do I find out MPI_Datatype is pointer type: Compile this, mpicc warns (on x86_64):
warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘struct ompi_datatype_t *’
The definition of struct ompi_datatype_t is:
struct ompi_datatype_t {
opal_datatype_t super; /**< Base opal_datatype_t superclass */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
int32_t id; /**< OMPI-layers unique id of the type */
int32_t d_f_to_c_index; /**< Fortran index for this datatype */
struct opal_hash_table_t *d_keyhash; /**< Attribute fields */
void* args; /**< Data description for the user */
void* packed_description; /**< Packed description of the datatype */
uint64_t pml_data; /**< PML-specific information */
/* --- cacheline 6 boundary (384 bytes) --- */
char name[MPI_MAX_OBJECT_NAME];/**< Externally visible name */
/* --- cacheline 7 boundary (448 bytes) --- */
/* size: 448, cachelines: 7, members: 7 */
};
typedef struct ompi_datatype_t ompi_datatype_t;
So it looks like each MPI_Datatype has a unique id.
So I tried to access the id filed with here. I got error:
error: dereferencing pointer to incomplete type ‘struct ompi_datatype_t’
ompi should be internal data structure. Is there any way to achive my goal?
Tool to generate PMPI wrapper: here
Generally speaking, MPI_Datatype is an opaque handler, so you cannot make any assumption, especially if your wrappers should be portable.
MPI_Datatype is indeed a pointer in Open MPI, but it is a number in MPICH iirc.
(older) Fortran uses integer in order to refer a datatype, so one option is to use the following subroutines
MPI_Fint MPI_Type_c2f(MPI_Datatype datatype);
MPI_Datatype MPI_Type_f2c(MPI_Fint datatype);
in order to convert between a MPI_Datatype and a MPI_Fint (an int unless you built Open MPI with 8 bytes Fortran integers)
That being said, if you want to compare datatypes between runs, you might want to consider these subroutines
int MPI_Type_set_name(MPI_Datatype type, const char *type_name);
int MPI_Type_get_name(MPI_Datatype type, char *type_name, int *resultlen);
So you do not have to worry about race conditions nor changing the sequence in which derived datatypes are created by your app.

In Ada, is there a way to make an enumeration type act like a modulus type -- to wrap to 0 after the last of it's range?

I am re-writing an encryption/compression library and it seems like it is getting to be a lot of processing per bytes processed. I would prefer to use an enumeration type when choosing which of several limited ways the encryption can go (the proper way), but when those paths become cyclical, I have to add extra code to test for type'last and type'first. I can always just write such a condition in for the type, or assign the addition/subtraction operator on the type a function to wrap around the result, but that is more code and processing that will add up quickly when it has to run every eight bytes along with everything else. Is there a way to make the operation about as efficient as if it were a simple "mod" type, like
type Modular is mod 64 ....;
for ......;
pragma ....;
type Frequency_Counter is array(Modular) of Long_Integer;
Head : Modular := (others => 0);
Freq : Frequency_Counter(Size) := (others => 0);
Encryption_Label : Modular := Hash3;
Block_Sample : Modular := Hash5;
...
Hash3 := Hash3 + 1;
Freq (Hash3):= Freq(Hash3) + 1; -- Here is where my made-on-the-fly example is focused
I think I can make the whole algorithm more efficient and use enumeration types if I can just get the enumeration type to do math in the processor in the same number of cycles as with a mod type math. I have gotten a little creative in thinking of a way, but they were too obviously not right for me to use any of them as an example. The only thing I can think might be possible exceeds my skill, and that is making a procedure using inline ASM (gas assembly language syntax) to make the operation very direct to the processor.
PS: I know this is a minor gain, alone. Any gain is appropriate for the application.
Not sure that it’ll make much difference!
Given this
package Cyclic is
type Enum is (A, B, C, D, E);
type Modular is mod 5;
function Next_Enum (En : Enum) return Enum is
(if En = Enum'Last then Enum'First else Enum'Succ (En)) --'
with Inline_Always;
end Cyclic;
and
with Cyclic; use Cyclic;
procedure Cyclic_Use (N : Natural; E : in out Enum; M : in out Modular) is
begin
begin
for J in 1 .. N loop
E := Next_Enum (E);
end loop;
end;
begin
for J in 1 .. N loop
M := M + 1;
end loop;
end;
end Cyclic_Use;
and compiling using GCC 5.2.0 with -O3 (gnatmake -O3 -c -u -f cyclic_use.adb -cargs -S), the x86_64 assembler generated for the two loops is
(enumeration)
L3:
leal 1(%rsi), %ecx
addl $1, %eax
cmpb $4, %sil
cmove %r8d, %ecx
cmpl %eax, %edi
movl %ecx, %esi
jne L3
(modular)
L4:
leal -4(%rdx), %ecx
addl $1, %eax
cmpb $3, %dl
leal 1(%rdx), %r8d
movl %ecx, %edx
cmovle %r8d, %edx
cmpl %eax, %edi
jne L4
I don’t pretend to know x86_64 assembler, and I don’t know why the enumeration version compares against 4 while the modular version compares against 3, but these look very similar to me! but the enumeration version is one instruction shorter ...

Struct Stuffing Incorrectly

I have the following struct:
typedef union
{
struct
{
unsigned char ID;
unsigned short Vdd;
unsigned char B1State;
unsigned short B1FloatV;
unsigned short B1ChargeV;
unsigned short B1Current;
unsigned short B1TempC;
unsigned short B1StateTimer;
unsigned short B1DutyMod;
unsigned char B2State;
unsigned short B2FloatV;
unsigned short B2ChargeV;
unsigned short B2Current;
unsigned short B2TempC;
unsigned short B2StateTimer;
unsigned short B2DutyMod;
} bat_values;
unsigned char buf[64];
} BATTERY_CHARGE_STATUS;
and I am stuffing it from an array as follows:
for(unsigned char ii = 0; ii < 64; ii++) usb_debug_data.buf[ii]=inBuffer[ii];
I can see that the array has the following (arbitrary) values:
inBuffer[0] = 80;
inBuffer[1] = 128;
inBuffer[2] = 12;
inBuffer[3] = 0;
inBuffer[4] = 23;
...
now I want display these values by changing the text of a QEditLine:
str=QString::number((int)usb_debug_data.bat_values.ID);
ui->batID->setText(str);
str=QString::number((int)usb_debug_data.bat_values.Vdd)
ui->Vdd->setText(str);
str=QString::number((int)usb_debug_data.bat_values.B1State)
ui->B1State->setText(str);
...
however, the QEditLine text values are not turning up as expected. I see the following:
usb_debug_data.bat_values.ID = 80 (correct)
usb_debug_data.bat_values.Vdd = 12 (incorrect)
usb_debug_data.bat_values.B1State = 23 (incorrect)
seems like 'usb_debug_data.bat_values.Vdd', which is a short, is not taking its value from inBuffer[1] and inBuffer[2]. Likewise, 'usb_debug_data.bat_values.B1State' should get its value from inBuffer[3] but for some reason is picking up its value from inBuffer[4].
Any idea why this is happening?
C and C++ are free to insert padding between elements of a structure, and beyond the last element, for whatever purposes it desires (usually efficiency but sometimes because the underlying architecture does not allow unaligned access at all).
So you'll probably find that items of two-bytes length are aligned to two-byte boundaries, so you'll end up with something like:
unsigned char ID; // 1 byte
// 1 byte filler, aligns following short
unsigned short Vdd; // 2 bytes
unsigned char B1State; // 1 byte
// 3 bytes filler, aligns following int
unsigned int myVar; // 4 bytes
Many compilers will allow you to specific how to pack structures, such as with:
#pragma pack(1)
or the gcc:
__attribute__((packed))
attribute.
If you don't want to (or can't) pack your structures, you can revert to field-by-filed copying (probably best in a function):
void copyData (BATTERY_CHARGE_STATUS *bsc, unsigned char *debugData) {
memcpy (&(bsc->ID), debugData, sizeof (bsc->ID));
debugData += sizeof (bsc->ID);
memcpy (&(bsc->Vdd), debugData, sizeof (bsc->Vdd));
debugData += sizeof (bsc->Vdd);
: : :
memcpy (&(bsc->B2DutyMod), debugData, sizeof (bsc->B2DutyMod));
debugData += sizeof (bsc->B2DutyMod); // Not really needed
}
It's a pain that you have to keep the structure and function synchronised but hopefully it won't be changing that much.
Structs are not packed by default so the compiler is free to insert padding between members. The most common reason is to ensure some machine dependent alignment. The wikipedia entry on data structure alignment is a pretty good place to start. You essentially have two choices:
insert compiler specific pragmas to force alignment (e.g, #pragma packed or __attribute__((packed))__.
write explicit serialization and deserialization functions to transform your structures into and from byte arrays
I usually prefer the latter since it doesn't make my code ugly with little compiler specific adornments everywhere.
The next thing that you are likely to discover is that the byte order for multi-byte integers is also platform specific. Look up endianness for more details

Resources