Why do the hash values differ for NaN and Inf - Inf?

Why do the hash values differ for NaN and Inf - Inf? - r

I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?
This requires the digest package.
Plain text output:
> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:
> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"

tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...
Following up on #Jozef's answer: note boldfaced digits ...
> base::serialize(Inf-Inf,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
> base::serialize(NaN,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00
Alternatively, using pryr::bytes() ...
> bytes(NaN)
[1] "7F F8 00 00 00 00 00 00"
> bytes(Inf-Inf)
[1] "FF F8 00 00 00 00 00 00"
The Wikipedia article on floating point format/NaNs says:
Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:
sign = either 0 or 1.
biased exponent = all 1 bits.
fraction = anything except all 0 bits (since all 0 bits represents infinity).
The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).
To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...
It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:
single.NA: logical indicating if there is conceptually just one numeric
‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
patterns.
Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...

This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:
identical(
base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
base::serialize(NaN, connection = NULL, ascii = FALSE)
)
# [1] FALSE
Even though
identical(Inf-Inf, NaN)
# [1] TRUE

Related

Parsing bytes as BCD with Indy C++ Builder

I am trying to parse the length of a message received. The length is in BCD. When I use ReadSmallInt(), I get a reading interpreted as a hex value, not as BCD.
So, if I have a message like this:
00 84 60 00 00 00 19 02 10 70 38 00 00 0E C0 00
00 16 45 93 56 00 01 79 16 62 00 00 00 00 00 00
08 00 00 00 00 02 10 43 02 04 02 35 31 35 31 35
31 35 31 35 31 35 31 53 41 4C 45 35 31 30 30 31
32 33 34 35 36 37 38 31 32 33 34 35 36 37 38 39
30 31 32 33
I am expecting ReadSmallInt() to return 84, but instead it is returning 132, which is correct if you are reading a hex value instead of a BCD one.
According to this answer, ReadSmallInt() reads BCD, as in the examples it gets 11 and 13 (BCD) as lengths instead of 17 and 19 (hex).
I have fixed this with duct tape, but is there a more elegant way?
int calculated_length;
// getting the length in Hexa
calculated_length = AContext->Connection->IOHandler->ReadSmallInt();
// converting from hex binary to hex string
UnicodeString bcdLength = UnicodeString().sprintf(L"%04x", calculated_length);
// converting from hex string to int
calculated_length = bcdLength.ToInt();
ucBuffer.Length = calculated_length -2;
AContext->Connection->IOHandler->ReadBytes(ucBuffer, calculated_length - 2);

According to this answer, ReadSmallInt reads BCD
That is incorrect. You have misinterpreted what that answer is saying. NOTHING in that answer indicates that ReadSmallInt() reads in a Binary Coded Decimal, because it doesn't, as Indy DOES NOT support reading/writing BCDs at all. ReadSmallInt() simply reads in 2 bytes and returns them as-is as a 16-bit decimal integer (swapping the byte order, if needed). So, if you need to read in a BCD instead, you will have to read in the bytes and then parse them yourself. Or find a BCD library to handle it for you.
If you re-read that other question again more carefully, in the 2 examples it gives:
24 24 00 11 12 34 56 FF FF FF FF 50 00 8B 9B 0D 0A
24 24 00 13 12 34 56 FF FF FF FF 90 02 00 0A 8F D4 0D 0A
The 3rd and 4th bytes represent the message lengths (x00 x11 and x00 x13, respectively). As 16-bit values in network byte order, they represent decimal integers 17 and 19, respectively. And if you count the bytes present, you will see those values are the correct byte lengths of those messages. So, there are no BCDs involved here.
That is different than your example. Bytes x00 x84 in network byte order represent decimal integer 132. But your message is 84 bytes in size, not 132 bytes. So clearly the bytes x00 x84 DO NOT represent a 16-bit decimal value, so ReadSmallInt() is the wrong method to use in the first place.
In your "duct tape" code, you are taking the decimal value that ReadSmallInt() returns (132), converting it to a hex string ('0084'), and then parsing that to a decimal value (84). There is no method in Indy that will do that kind of conversion for you.
That "works" in your case, but whether or not that is the correct conversion to perform, I could not say for sure as you have not provided any details about the protocol you are dealing with. But, if you think the bytes represent a BCD then you should interpret the bytes in terms of an actual BCD.
In a packed BCD, a byte can represent a 2-digit number. In this case, byte x84 (10000100b) contains two nibbles 1000b (8) and 0100b (4), thus put together they form decimal 84, which is calculated as follows:
BYTE b = 0x84;
int len = (int((b >> 4) & 0x0F) * 10) + int(b & 0x0F);
Now, how that extends to multiple bytes in a BCD, I'm not sure, as my experience with BCDs is very limited. But, you are going to have to figure that out if you need to handle message lengths greater than 99 bytes, which is the highest decimal that a single BCD byte can represent.

Lua TCP communication

I have a proprietary client application that sends and receives TCP data packets to|from the network device like this:
Sent: [14 bytes]
01 69 80 10 01 0E 0F 00 00 00 1C 0D 64 82 .i..........d.
Received: [42 bytes] [+00:000]
01 69 80 10 01 2A 00 D0 DC CC 0C BB C0 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .i...*.......#..................
00 00 00 00 00 00 1C 0D F6 BE ..........
or see the picture
I need to make same requests with Lua. I've found some working examples (for ex) for such communications, but I can't understand what string should I give as an argument for
tcp:send("string");
Should I give it a string of hex? I.e.
'01698010010E0F0000001C0D6482'
Or first convert hex to ACSII? If so, then how (zeroes doesn't convert to symbols)?

You should give it the string you want to send. If you write "016980..." it's a string containing decimal values 48 (ascii digit 0), 49 (ascii digit 1), 54 (ascii digit 6), and so on. Which is not what you want to send. You want to send decimal values 1, 105 (hex 69), 128 (hex 80) and so on.
Luckily, Lua strings can hold any bytes (unlike e.g. Python strings). So you just have to write a string with those bytes. You can write any byte into a string using \x and then a 2-digit hex code. So you could write your call like this:
tcp:send("\x01\x69\x80\x10\x01\x0E\x0F\x00\x00\x00\x1C\x0D\x64\x82")
If you are using a Lua version older than 5.2, \x is not available, but you can still use \ and a 3-digit decimal code.

Get consistent md5 checksum of gzipped file using data.table and R.utils

I'm trying to replicate the checksum of a compressed file with R.utils::gzip and the compress = "gzip" argument of data.table::fwrite but I keep getting different results. Here is an example
library(data.table); library(R.utils); library(digest)
dt <- data.frame(a = c(1, 2, 3))
fwrite(dt, "r-utils.csv")
gzip("r-utils.csv")
fwrite(dt, "datatable-v1.csv.gz", compress = "gzip")
digest(file = "r-utils.csv.gz")
#> [1] "8d4073f4966f94ac5689c6e85de2d92d"
digest(file = "datatable-v1.csv.gz")
#> [1] "5d58f9eeefb166c6d50ac00f3215e689"
Initially I though that fwrite was storing the filename and timestamp in the output file (per the usual gzip behaviour without the --no-name option), but that don't appear to be the case since I get the same checksum in different calls to fwrite
fwrite(dt, "datatable-v2.csv.gz", compress = "gzip")
digest(file = "datatable-v2.csv.gz")
#> [1] "5d58f9eeefb166c6d50ac00f3215e689"
Any ideas on what might be causing the difference?
PS. Incidentally, the checksum of the uncompressed files are the same
$ md5sum datatable-v1.csv
7e034138dc91aa575d929c1ed65aa67c datatable-v1.csv
$ md5sum r-utils.csv
7e034138dc91aa575d929c1ed65aa67c r-utils.csv

If you look at the bytes, you can see they are different
r-utils.csv.gz
Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000: 1F 8B 08 00 00 00 00 00 00 06 4B E4 E5 32 E4 E5 ..........Kde2de
00000010: 32 E2 E5 32 E6 E5 02 00 21 EB 62 BF 0C 00 00 00 2be2fe..!kb?....
and
datatable-v2.csv.gz
Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000: 1F 8B 08 00 00 00 00 00 00 0A 4B E4 E5 02 00 56 ..........Kde..V
00000010: EF 2F E3 03 00 00 00 1F 8B 08 00 00 00 00 00 00 o/c.............
00000020: 0A 33 E4 E5 32 E2 E5 32 E6 E5 02 00 49 C4 FF 4D .3de2be2fe..ID.M
00000030: 09 00 00 00 ....
So the data.table one is longer. This appears to be because the default compression settings are different. Specifically it looks like the two method use a different "window size" parameter. The data.table code uses a windowBits value of 31 (15+16) which will include "trailing checksum in the output" but the R.utils::gzip function uses the base R gzfile() function which uses a windowBits value of -15 (MAX_WBITS) and that negative value means a trailing checksum should not be used. So I think that accounts for the extra bytes in the in the data.table output.
Because you can use different compression level settings and checksums and gzip headers, it's not necessarily the case the you will get the same checksum for compressed versions of data files if two different compression pipelines are used. So it's possible for the data inside to be identical but the actual compressed files to be different.
Since these settings are part of the C code for the package and for base R this is not something you will be able to change in R code. It's not possible for these two different methods to return identical output.

IRP_MJ_DEVICE_CONTROL — how to?

Coding a app using serial port, when debugging, I have been compelled to work with low level (link control) protocol.
And here my problems begun.
Sniffer gives me values:
IOCTL_SERIAL_SET_BAUD_RATE 80 25 00 00 means baud rate 9600. Well, 00 c2 01 00 means 115200. How is it possible to guess it?
IOCTL_SERIAL_SET_TIMEOUTS 32 00 00 00 05 00 00 00 00 00 00 00 60 09 00 00 00 00 00 00 - what does this mean? What is the value? What is the range of admissible values? I had read MSDN - "Setting Read and Write Timeouts for a Serial Device", for example. Blah-blah-blah, but no any value. What to read? How to understand sniffer data? And how to control it?

bigz variable type from package gmp with ifelse()

Why does the following return this error?
> x <- as.bigz(5)
> y <- ifelse(1,x,0)
Error in ifelse(1, x, 0) :
incompatible types (from raw to logical) in subassignment type fix
I can get around it by doing
> x <- as.bigz(5)
> y <- as.bigz(ifelse(1,as.character(x),0))
It seems to have something to do with the fact that
> as.raw(5)
[1] 05
but
> as.raw(as.bigz(5))
[1] 01 00 00 00 01 00 00 00 01 00 00 00 05 00 00 00
Which suggests that ifelse() is doing a "as.raw" automatically.
Still though, if
> y <- as.raw(as.bigz(5))
> y
[1] 01 00 00 00 01 00 00 00 01 00 00 00 05 00 00 00
is possible, what is the difference?

Basically this means there is no ifelse.bigz method currently defined. base::ifelse doesn't understand bigz objects.
Instead, use if ... else , since if(bigz_x [relationship operator] bigz_y) will work because the relationship operators do have bigz methods, thus returning a logical value that if can work with.
Rgames> if(1) x else 0
Big Integer ('bigz') :
[1] 5

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why do the hash values differ for NaN and Inf - Inf? - r

Related

Parsing bytes as BCD with Indy C++ Builder

Lua TCP communication

Get consistent md5 checksum of gzipped file using data.table and R.utils

IRP_MJ_DEVICE_CONTROL — how to?

bigz variable type from package gmp with ifelse()

Categories

Resources