Extracting record from big endian data - ada

I have the following code for network protocol implementation. As the protocol is big endian, I wanted to use the Bit_Order attribute and High_Order_First value but it seems I made a mistake.
With Ada.Unchecked_Conversion;
with Ada.Text_IO; use Ada.Text_IO;
with System; use System;
procedure Bit_Extraction is
type Byte is range 0 .. (2**8)-1 with Size => 8;
type Command is (Read_Coils,
Read_Discrete_Inputs
) with Size => 7;
for Command use (Read_Coils => 1,
Read_Discrete_Inputs => 4);
type has_exception is new Boolean with Size => 1;
type Frame is record
Function_Code : Command;
Is_Exception : has_exception := False;
end record
with Pack => True,
Size => 8;
for Frame use
record
Function_Code at 0 range 0 .. 6;
Is_Exception at 0 range 7 .. 7;
end record;
for Frame'Bit_Order use High_Order_First;
for Frame'Scalar_Storage_Order use High_Order_First;
function To_Frame is new Ada.Unchecked_Conversion (Byte, Frame);
my_frame : Frame;
begin
my_frame := To_Frame (Byte'(16#32#)); -- Big endian version of 16#4#
Put_Line (Command'Image (my_frame.Function_Code)
& " "
& has_exception'Image (my_frame.Is_Exception));
end Bit_Extraction;
Compilation is ok but the result is
raised CONSTRAINT_ERROR : bit_extraction.adb:39 invalid data
What did I forget or misunderstand ?
UPDATE
The real record in fact is
type Frame is record
Transaction_Id : Transaction_Identifier;
Protocol_Id : Word := 0;
Frame_Length : Length;
Unit_Id : Unit_Identifier;
Function_Code : Command;
Is_Exception : Boolean := False;
end record with Size => 8 * 8, Pack => True;
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Function_Code at 7 range 0 .. 6;
Is_Exception at 7 range 7 .. 7;
end record;
Where Transaction_Identifier, Word and Length are 16-bit wide.
These ones are displayed correctly if I remove the Is_Exception field and extend Function_Code to 8 bits.
The dump of the frame to decode is as following:
00000000 00 01 00 00 00 09 11 03 06 02 2b 00 64 00 7f
So my only problem is really to extract the 8th bit of the last byte.

So,
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Function_Code at 7 range 0 .. 6;
Is_Exception at 7 range 7 .. 7;
end record;
It seems you want Is_Exception to be the the LSB of the last byte?
With for Frame'Bit_Order use System.High_Order_First; the LSB will be bit 7,
(also, 16#32# will never be -- Big endian version of 16#4#, the bit pattern just doesn't match)
It may be more intuitive and clear to specify all of your fields relative to the word they're in, rather than the byte:
Unit_ID at 6 range 0..7;
Function_Code at 6 range 8 .. 14;
Is_Exception at 6 range 15 .. 15;
Given the definition of Command above, the legal values for the last byte will then be:
2 -> READ_COILS FALSE
3 -> READ_COILS TRUE
8 -> READ_DISCRETE_INPUTS FALSE
9 -> READ_DISCRETE_INPUTS TRUE
BTW,
by applying your update to your original program, and adding/changing the following, you program works for me
add
with Interfaces;
add
type Byte_Array is array(1..8) of Byte with Pack;
change, since we don't know the definition
Transaction_ID : Interfaces.Unsigned_16;
Protocol_ID : Interfaces.Unsigned_16;
Frame_Length : Interfaces.Unsigned_16;
Unit_ID : Interfaces.Unsigned_8;
change
function To_Frame is new Ada.Unchecked_Conversion (Byte_Array, Frame);
change
my_frame := To_Frame (Byte_Array'(00, 01, 00, 00, 00, 09, 16#11#, 16#9#));

I finally found what was wrong.
In fact, the Modbus Ethernet Frame definition mentioned that, in case of exception, the returned code should be the function code plus 128 (0x80) (see explanation on Wikipedia). That's the reason why I wanted to represent it through a Boolean value but my representation clauses were wrong.
The correct clauses are these ones :
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Is_Exception at 6 range 8 .. 8;
Function_Code at 6 range 9 .. 15;
end record;
This way, the Modbus network protocol is correctly modelled (or not but at least, my code is working).
I really thank egilhh and simonwright for making me find what was wrong and explain the semantics behind the aspects.
Obviously, I don't know who reward :)

Your original record declaration works fine (GNAT complains about the Pack, warning: pragma Pack has no effect, no unplaced components). The problem is with working out the little-endian Byte.
---------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | BE bit numbers
---------------------------------
| c c c c c c c | e |
---------------------------------
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | LE bit numbers
---------------------------------
so if you want the Command to be Read_Discrete_Inputs, the Byte needs to have BE bit 4 (LE bit 3) set i.e. LE 16#8#.

Take a look at this AdaCore post on bit order and byte order to see how they handle it. After reading that, you will probably find that the bit order of your frame value is really 16#08#, which probably is not what you are expecting.
Big Endian / Little Endian typically refers to Byte order rather than bit order, so when you see that Network protocols are Big Endian, they mean Byte order. Avoid setting Bit_Order for your records. In modern systems, you will almost never need that.
Your record is only one byte in size, so byte order won't matter for it by itself. Byte order comes into play when you have larger field values (>8 bits long).

The bit_order pragma doesn't reverse the order that the bits appear in memory. It simply defines whether the most significant bit (left most) will be logically referred to as zero (High_Order_First) or the least significant bit will be referred to as zero (Low_Order_First) when interpreting the First_Bit and Last_Bit offsets from the byte position in the representation clause. Keep in mind that these offsets are taken from the MSB or LSB of the scalar the record component belongs to AS A VALUE. So in order for the byte positions to carry the same meaning on a little endian CPU as they do on a big endian CPU (as well as the in memory representation of multibyte machine scalars, which exist when one or more record components with the same byte position have a last_bit value which exceeds the capacity of a single byte) then 'Scalar_Storage_Order must also be specified.

Related

Output R dataframe to SAS format Issue

I have a dataset that looks like this:
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
I am using this to export to an .xpt file
write.xport(df_dummy, file = "data/tmp.xpt")
lookup.xport("data/tmp.xpt")
In SAS, I use this code to import:
libname sasfile 'PATH\data';
libname xptfile xport 'PATH\data\tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
The table looks fine, but the rate doesn't show the decimal point.
In my actual dataset, there are a lot more rows but it's the same format essentially and if I run a lookup.xport I get this:
Variables in data set `MEASURES':
dataset name type format flength fdigits iformat iflength ifdigits label nobs
MEASURES ID character 0 0 0 0 29064
MEASURES MEASURE character 0 0 0 0 29064
MEASURES NUM numeric 0 0 0 0 29064
MEASURES DEN numeric 0 0 0 0 29064
MEASURES RATE numeric 0 0 0 0 29064
However, if I use the same SAS code to import this, I get something that looks completely off and I can't figure out what's causing it.
I cannot replicate your issue using R (3.4.1) and SAS (9.4 TS1M4) on Mac OS X with both being 64 bit versions. The 32/64 bit versions can cause issues sometimes.
I used R Studio and SAS UE, both freely available for education usage.
Full R code:
install.packages("SASxport")
library("SASxport")
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
write.xport(df_dummy, file = "tmp.xpt")
Full SAS Code:
libname sasfile '/folders/myfolders/';
libname xptfile xport '/folders/myfolders/tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
Your example works. Even with older version or R. Make sure your transport file had not been corrupted by transferring between machines. A transport file is binary data with fixed length 80 byte records, but much of data looks like ASCII codes.
SAS transport files follow the SAS V5 rules for names. Make sure that your member name and variable names are valid SAS names and are not longer than 8 characters. Character variables cannot be longer than 200 characters.
You can quickly look at the file using a simple data step. Especially for your small example. So if you see that the length is not exactly a multiple of 80 or you see that the header records do not start at the beginning of an 80 byte record then something has corrupted the file.
56 data _null_;
57 infile '/test/tmp.xpt' lrecl=80 recfm=f ;
58 input;
59 list;
60 run;
NOTE: The infile '/test/tmp.xpt' is:
Filename=/test/tmp.xpt,
Owner Name=xxxxx,Group Name=xxxxx,
Access Permission=-rw-r--r--,
Last Modified=29Sep2017:09:16:16,
File Size (bytes)=1680
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
2 CHAR SAS SAS SASLIB 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222545222225454442232332222523232302222222222222222222222223354533333333333
NUMR 3130000031300000313C92007E000000203E0E200000000000000000000000002935017A09A16A16
3 29SEP17:09:16:16
4 HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
5 HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
6 CHAR SAS DF_DUMMYSASDATA 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222445454455454454232332222523232302222222222222222222222223354533333333333
NUMR 3130000046F45DD9313414107E000000203E0E200000000000000000000000002935017A09A16A16
7 29SEP17:09:16:16
8 HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000500000000000000000000
9 CHAR ........COMPANY ........
ZONE 00000000444544522222222222222222222222222222222222222222222222220000000022222222
NUMR 020008013FD01E900000000000000000000000000000000000000000000000000000000000000000
10 CHAR ....................................................................MEASURE
ZONE 00000000000000000000000000000000000000000000000000000000000000000000444555422222
NUMR 00000000000000000000000000000000000000000000000000000000000002000802D51352500000
11 CHAR ........ ....................
ZONE 22222222222222222222222222222222222222222222000000002222222200000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000008000000000000
12 CHAR ................................................NUM
ZONE 00000000000000000000000000000000000000000000000045422222222222222222222222222222
NUMR 000000000000000000000000000000000000000001000803E5D00000000000000000000000000000
13 CHAR ........ ........................................
ZONE 22222222222222222222222200000000222222220000000100000000000000000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
14 CHAR ............................DEN
ZONE 00000000000000000000000000004442222222222222222222222222222222222222222222222222
NUMR 000000000000000000000100080445E0000000000000000000000000000000000000000000000000
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
15 CHAR ........ ............................................................
ZONE 22220000000022222222000000010000000000000000000000000000000000000000000000000000
NUMR 00000000000000000000000000080000000000000000000000000000000000000000000000000000
16 CHAR ........RATE ........
ZONE 00000000545422222222222222222222222222222222222222222222222222220000000022222222
NUMR 01000805214500000000000000000000000000000000000000000000000000000000000000000000
17 CHAR ....... ....................................................
ZONE 00000002000000000000000000000000000000000000000000000000000022222222222222222222
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
18 HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
19 CHAR 0001 A A ......B.......B2......0002 B A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00010000100000001000000024000000220000000002000020000000100000002400000022000000
20 CHAR 0003 C A ......B.......B2......0004 D A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00030000300000001000000024000000220000000004000040000000100000002400000022000000
21 CHAR 0005 E A ......B.......B2......
ZONE 33332222422222224A00000041000000430000002222222222222222222222222222222222222222
NUMR 00050000500000001000000024000000220000000000000000000000000000000000000000000000
NOTE: 21 records were read from the infile '/test/tmp.xpt'.

Complex select in SQLite view

I have two tables where Security holds the access bit mask for a given NTFS file system scan and FileSystemRights which equates to the string representations for the well known bit masks. I need to create a view which exposes the expected (not just proper) string representations for a given bit mask. The problem is several enum values composite and contain combinations of lower values, so the desired idea is not to repeat the implicit values.
For example, a value of 1179817 (Security.Id = 24) should only report ReadAndExecute and Synchronize, excluding ExecuteFile, ListDirectory, Read, ReadAttributes, ReadData, ReadExtendedAttributes, ReadPermissions and Traverse, as those are all part of ReadAndExecute (eg. ReadAndExecute & Read == Read). Its obviously correct to show them all, but a user wants only to see the non implicit values.
I'm lost within the constraints of SQL to produce a join that behaves like this without some abysmal nested case that would be a nightmare to look at.
Does a better programmatic approach exist?
FileSystemRights
================
Id Name Value
-- ---- -----
1 None 0
2 ListDirectory 1
3 ReadData 1
4 WriteData 2
5 CreateFiles 2
6 CreateDirectories 4
7 AppendData 4
8 ReadExtendedAttributes 8
9 WriteExtendedAttributes 16
10 ExecuteFile 32
11 Traverse 32
12 DeleteSubdirectoriesAndFiles 64
13 ReadAttributes 128
14 WriteAttributes 256
15 Write 278
16 Delete 65536
17 ReadPermissions 131072
18 Read 131209
19 ReadAndExecute 131241
20 Modify 197055
21 ChangePermissions 262144
22 TakeOwnership 524288
23 Synchronize 1048576
24 FullControl 2032127
25 GenericAll 268435456
26 GenericExecute 536870912
27 GenericWrite 1073741824
28 GenericRead 2147483648
Security
========
Id FileSystemRights IdentityReference
-- ---------------- -----------------
20 2032127 BUILTIN\Administrators
21 2032127 BUILTIN\Administrators
22 2032127 NT AUTHORITY\SYSTEM
23 268435456 CREATOR OWNER
24 1179817 BUILTIN\Users
25 4 BUILTIN\Users
26 2 BUILTIN\Users
MyView
======
SELECT s.Id AS SecurityId,
f.Name
FROM Security s
JOIN FileSystemRights f
ON CASE f.Value
WHEN 0 THEN s.FileSystemRights = f.Value
ELSE (s.FileSystemRights & f.Value) == f.Value
END
ORDER BY s.Id, f.Name;
Add the actual value of the name to the query.
Then wrap another query around that to filter out values for the same entry that are a subset of another value:
WITH AllValues(SecurityId, Name, Value) AS (
SELECT s.Id,
f.Name,
f.Value
FROM Security s
JOIN FileSystemRights f
ON CASE f.Value
WHEN 0 THEN s.FileSystemRights = f.Value
ELSE (s.FileSystemRights & f.Value) == f.Value
END
)
SELECT SecurityId,
Name
FROM AllValues
WHERE NOT EXISTS (SELECT *
FROM AllValues AS AV2
WHERE AV2.SecurityId = AllValues.SecurityId
AND (AV2.Value & AllValues.Value) != 0
AND AV2.Value > AllValues.Value
)
ORDER BY 1, 2;

Hash Table + Binary Search

I'm using an Hash Table to store some values. Here are the details:
There will be roughly 1M items to store (not known before, so no perfect-hash possible).
Table is 10M large.
Hash function is MurMurHash3.
I did some tests and storing 1M values I get 350,000 collisions and 30 elements at the most-colliding hash table's slot.
Are these result good?
Would it make sense to implement Binary Search for lists that get created at colliding hash-table's slots?
What' your advice to improve performances?
EDIT: Here is my code
var
HashList: array [0..10000000 - 1] of Integer;
for I := 0 to High(HashList) do
HashList[I] := 0;
for I := 1 to 1000000 do
begin
Y := MurmurHash3(UIntToStr(I));
Y := Y mod Length(HashList);
Inc(HashList[Y]);
if HashList[Y] > 1 then
Inc(TotalCollisionsCount);
if HashList[Y] > MostCollidingSlotItemCount then
MostCollidingSlotItemCount := HashList[Y];
end;
Writeln('Total: ' + IntToStr(TotalCollisionsCount) + ' Max: ' + IntToStr(MostCollidingSlotItemCount));
Here is the result I get:
Total: 48169 Max: 5
Am I missing something?
This is what you get when you put 1M items randomly into 10M cells
calendar_size=10000000 nperson = 1000000
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 9048262 (0.904826) 0 (0.000000) 0 0 0
1: 905064 (0.090506) 905064 (0.905064) 1 905064 905064
2: 45136 (0.004514) 90272 (0.090272) 3 135408 1040472
3: 1488 (0.000149) 4464 (0.004464) 6 8928 1049400
4: 50 (0.000005) 200 (0.000200) 10 500 1049900
----+---------+--------+----------+--------+------+--------+--------
5: 10000000 1000000 1.049900 1049900
The left column is the number of items in a cell. The second: the number of cells having this itemcount.
WRT the binary search: it is obvious that for small tables like this (maximum chain length=4, but most chains are of length=1), linear search outperforms binary search. The takeover-point is probably somewhere between 10 and 100.

How to see variables stored on the stack with GDB

I'm trying to figure out what is stored at a certain place on the stack with GDB. I have a statement:
cmpl $0x176,-0x10(%ebp)
In this function I'm comparing 0x176 to the -0x10(%ebp) and I am wondering if there is a way to see what is stored at -0x10(%ebp).
I am wondering if there is a way to see what is stored at -0x10(%ebp).
Assuming you have compiled with debug info, info locals will tell you about all the local variables in current frame. After that, print (char*)&a_local - (char*)$ebp will tell you the offset from start of a_local to %ebp, and you can usually find out what local is close to 0x176.
Also, if your locals have initializers, you can do info line NN to figure out which assembly instruction range corresponds to initialization of a given local, then disas ADDR0,ADDR1 to see the disassembly, and again understand which local is located at what offset.
Another alternative is to readelf -w a.out, and look for entries like this:
int foo(int x) { int a = x; int b = x + 1; return b - a; }
<1><25>: Abbrev Number: 2 (DW_TAG_subprogram)
<26> DW_AT_external : 1
<27> DW_AT_name : foo
<2b> DW_AT_decl_file : 1
<2c> DW_AT_decl_line : 1
<2d> DW_AT_prototyped : 1
<2e> DW_AT_type : <0x67>
<32> DW_AT_low_pc : 0x0
<36> DW_AT_high_pc : 0x23
<3a> DW_AT_frame_base : 0x0 (location list)
<3e> DW_AT_sibling : <0x67>
<2><42>: Abbrev Number: 3 (DW_TAG_formal_parameter)
<43> DW_AT_name : x
<45> DW_AT_decl_file : 1
<46> DW_AT_decl_line : 1
<47> DW_AT_type : <0x67>
<4b> DW_AT_location : 2 byte block: 91 0 (DW_OP_fbreg: 0)
<2><4e>: Abbrev Number: 4 (DW_TAG_variable)
<4f> DW_AT_name : a
<51> DW_AT_decl_file : 1
<52> DW_AT_decl_line : 1
<53> DW_AT_type : <0x67>
<57> DW_AT_location : 2 byte block: 91 74 (DW_OP_fbreg: -12)
<2><5a>: Abbrev Number: 4 (DW_TAG_variable)
<5b> DW_AT_name : b
<5d> DW_AT_decl_file : 1
<5e> DW_AT_decl_line : 1
<5f> DW_AT_type : <0x67>
<63> DW_AT_location : 2 byte block: 91 70 (DW_OP_fbreg: -16)
This tells you that x is stored at fbreg+0, a at fbreg-12, and b at fbreg-16. Now you just need to examine location list to figure out how to derive fbreg from %ebp. The list for above code looks like this:
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 00000000 00000001 (DW_OP_breg4: 4)
00000000 00000001 00000003 (DW_OP_breg4: 8)
00000000 00000003 00000023 (DW_OP_breg5: 8)
00000000 <End of list>
So for most of the body, fbreg is %ebp+8, which means that a is at %ebp-4. Disassembly confirms:
00000000 <foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 10 sub $0x10,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax # 'x' => %eax
9: 89 45 fc mov %eax,-0x4(%ebp) # '%eax' => 'a'
...

About MPEG-4 headers

I examined some MPEG-4 video headers and saw some byte arrays like below at the beginning:
00 00 01 B0 01 00 00 01 B5 89 13
I know 00 00 01 parts but what exactly B0 B1 and B5 89 13 parts mean? Actually, if I put this byte array infront of an MPEG-4 stream, it works fine.
But I don't know if those values works with different mpeg-4 stream sources ?
0x000001B0 -> Visual Object Sequence Start (VOSS) Code
0x000001B5 -> Visual Object Start (VOS) Code
You can find the complete MPEG-4 elementary video header details at "ISO/IEC 14496-2" documentation. Here are the details you asked for.
Visual Object Sequence Start (VOSS) Code
-> 4 bytes visual object sequence start code = long hex value of 0x000001B0
-> 8 bits profile/level indicator = 1 byte unsigned number
Visual Object Start (VOS) Code
-> 4 bytes visual object start code = long hex value of 0x000001B5
-> 1 bit has id marker flag = 1/4 nibble flag
_ID_Marker_Section_
-> 4 bits version id = 1 nibble unsigned value - only if marker is true
- version id types are ISO 14496-2 = 1
-> 3 bits visual object priority = 3/4 nibble unsigned value - only if marker is true
- priorities are 1 through to 7
-> 4 bits visual object type = 1 nibble unsigned value
- types are video = 1 ; still texture = 2 ; mesh = 3 ; face = 4
-> 1 bit video signal type = 1/4 nibble flag
- NOTE: if this is false Y has a sample range of 16 through to 235

Resources