Convert binary string to binary or decimal value - r

Is there any function to convert binary string into binary or decimal value?
If I have a binary string 000101, what should I do to convert it into 5?

You could use the packBits function (in the base package). Bear in mind that this function requires very specific input.
(yy <- intToBits(5))
# [1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# [26] 00 00 00 00 00 00 00
# Note that there are 32 bits and the order is reversed from your example
class(yy)
[1] "raw"
packBits(yy, "integer")
# [1] 5
There is also the strtoi function (also in the base package):
strtoi("00000001001100110000010110110111", base = 2)
# [1] 20121015
strtoi("000101", base = 2)
# [1] 5

Here is what you can try:
binStr <- "00000001001100110000010110110111" # 20121015
(binNum <- 00000001001100110000010110110111) # 20121015
[1] 1.0011e+24
binVec <- c(1,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1) # 2670721
shortBin <- 10011010010 # 1234
BinToDec <- function(x)
sum(2^(which(rev(unlist(strsplit(as.character(x), "")) == 1))-1))
BinToDec(binStr)
[1] 20121015
BinToDec(binNum)
[1] 576528
BinToDec(binVec)
[1] 2670721
BinToDec(shortBin)
[1] 1234
That is, you can input both strings (because of as.character()) and numeric binary values but there are some problems with large numbers like binNum. As I understand you also want to convert binary string to numeric binary values, but unfortunately there is no such data type at least in base R.
Edit: Now BinToDec also accepts binary vectors, which might be a solution for large numbers. Function digitsBase() from package sfsmisc returns such a vector:
(vec <- digitsBase(5, base= 2, 10))
Class 'basedInt'(base = 2) [1:1]
[,1]
[1,] 0
[2,] 0
[3,] 0
[4,] 0
[5,] 0
[6,] 0
[7,] 0
[8,] 1
[9,] 0
[10,] 1
BinToDec(vec)
[1] 5
Finally, another possibility is package compositions , for example:
(x <- unbinary("10101010"))
[1] 170
(y <- binary(x))
[1] "10101010"

base::strtoi(binary_string, base = 2)

This function calculates the decimal version with a flexible base. Base equals 2 is binary, etc. This should work up until a base of 10.
base2decimal = function(base_number, base = 2) {
split_base = strsplit(as.character(base_number), split = "")
return(sapply(split_base, function(x) sum(as.numeric(x) * base^(rev(seq_along(x) - 1)))))
}
> base2decimal(c("000101", "00000001001100110000010110110111"))
[1] 5 20121015

In the case that you have binary string, all of the prior answers are great. I often find myself in situations where I want to encode a combination of binary vectors. The logic of translating from a combination of 0's and 1's to an integer is always the same:
bincount <- function(B, base=2) { return(B %*% base^seq(0,ncol(B)-1)) }
Where B is a matrix, and each column is a binary vector.
Example:
isBig <- c(0, 1, 0, 1)
isRed <- c(0, 0, 1, 1)
B = cbind(isBig,isRed)
bincount(B)
# 0 1 2 3

Related

How to represent 4 bit integers properly?

I need to convert a matrix to a hex file output. Each entry in the matrix needs to get translated to a 4 bit hex digit (8) and output in a single dimension array.
> matrix(c(0,0,0,5,0,0,5,5,0,0,5,0),nrow=3,ncol=4,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 0 0 0 5
[2,] 0 0 5 5
[3,] 0 0 5 0
This is a 3 row, 4 column matrix with mostly 0s and some 5s. My desired output should be something similar to
#> as.raw(c(0,8,0,136,0,128))
> as.raw(solution)
[1] 00 08 00 88 00 80
I was trying to do some simple
> sidewaysraw<-as.raw(ifelse(mymat==5, 8,0))
but the 8 in the ifelse of course is a 16 bit integer, so it's always an 0x08. I don't see a slick way to translate 55s to 0x88s, 05s to 0x08s and 50s to 0x80s...
Is there a smooth way to get R to work with 4 bit integers?
It seems like you could do some matrix multiplication to help. First we can define a "translation matrix"
digits <- matrix(c(128,8,0,0,0,0,128,8), nrow=4, ncol=2)
Then you can get your numbers out with
(dd==5) %*% digits
# [,1] [,2]
# [1,] 0 8
# [2,] 0 136
# [3,] 0 128
and then extract them in the right order with a transposition
as.raw(t((dd==5) %*% digits))
# [1] 00 08 00 88 00 80
This should be efficient and doesn't bother with string manipulation.
Assuming I understood your question correctly, this solves your problem:
mat <- matrix(c(0,0,0,5,0,0,5,5,0,0,5,0),nrow=3,ncol=4,byrow=T)
mat_new <- matrix(t(mat),ncol=2,byrow=T) #reformat to 2 columns
vec <- apply(mat_new,1,function(row)
{
num <- paste0(row,collapse="") #collapse the rows
if(num == "00") return(as.raw(0)) #switch the resulting char
else if(num == "05") return(as.raw(8))
else if(num == "50") return(as.raw(128))
else if(num == "55") return(as.raw(136))
})
vec
[1] 00 08 00 88 00 80

Why do the hash values differ for NaN and Inf - Inf?

I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?
This requires the digest package.
Plain text output:
> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:
> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...
Following up on #Jozef's answer: note boldfaced digits ...
> base::serialize(Inf-Inf,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
> base::serialize(NaN,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00
Alternatively, using pryr::bytes() ...
> bytes(NaN)
[1] "7F F8 00 00 00 00 00 00"
> bytes(Inf-Inf)
[1] "FF F8 00 00 00 00 00 00"
The Wikipedia article on floating point format/NaNs says:
Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:
sign = either 0 or 1.
biased exponent = all 1 bits.
fraction = anything except all 0 bits (since all 0 bits represents infinity).
The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).
To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...
It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:
single.NA: logical indicating if there is conceptually just one numeric
‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
patterns.
Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...
This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:
identical(
base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
base::serialize(NaN, connection = NULL, ascii = FALSE)
)
# [1] FALSE
Even though
identical(Inf-Inf, NaN)
# [1] TRUE

Equivalent bitget function in R

Is there a function in R that performs the same operation as bitget in MatLab/Octave:
bitget
From the bitget help page
Return the status of bit(s) n of unsigned integers in A the
lowest significant bit is n = 1.
bitget (100, 8:-1:1)
⇒ 0 1 1 0 0 1 0 0
so if you want to get the bit values for an integer in R, you can do
intToBits(100)[8:1]
# [1] 00 01 01 00 00 01 00 00
That technically returns a raw vector, so if you want just a numeric vector, do
as.numeric(intToBits(100)[8:1])
# [1] 0 1 1 0 0 1 0 0

bigz variable type from package gmp with ifelse()

Why does the following return this error?
> x <- as.bigz(5)
> y <- ifelse(1,x,0)
Error in ifelse(1, x, 0) :
incompatible types (from raw to logical) in subassignment type fix
I can get around it by doing
> x <- as.bigz(5)
> y <- as.bigz(ifelse(1,as.character(x),0))
It seems to have something to do with the fact that
> as.raw(5)
[1] 05
but
> as.raw(as.bigz(5))
[1] 01 00 00 00 01 00 00 00 01 00 00 00 05 00 00 00
Which suggests that ifelse() is doing a "as.raw" automatically.
Still though, if
> y <- as.raw(as.bigz(5))
> y
[1] 01 00 00 00 01 00 00 00 01 00 00 00 05 00 00 00
is possible, what is the difference?
Basically this means there is no ifelse.bigz method currently defined. base::ifelse doesn't understand bigz objects.
Instead, use if ... else , since if(bigz_x [relationship operator] bigz_y) will work because the relationship operators do have bigz methods, thus returning a logical value that if can work with.
Rgames> if(1) x else 0
Big Integer ('bigz') :
[1] 5

Can I get the byte representation of an R float?

I'm trying to read in a complicated data file that has floating point values. Some C code has been supplied that handles this format (Met Office PP file) and it does a lot of bit twiddling and swapping. And it doesn't work. It gets a lot right, like the size of the data, but the numerical values in the returned matrix are nonsensical, have NaNs and values like 1e38 and -1e38 liberally sprinkled.
However, I have a binary exe ("convsh") that can convert these to netCDF, and the netCDFs look fine - nice swirly maps of wind speed.
What I'm thinking is that the bytes of the PP file are being read in in the wrong order. If I could compare the bytes of the floats returned correctly in the netCDF data with the bytes in the floats returned wrongly from the C code, then I might figure out the correct swappage.
So is there a plain R function to dump the four (or eight?) bytes of a floating point number? Something like:
> as.bytes(pi)
[1] 23 54 163 73 99 00 12 45 # made up values
searches for "bytes" and "float" and "binary" haven't helped.
Its trivial in C, I could probably have written it in the time it took me to write this...
rdyncall might give you what you're looking for:
library(rdyncall)
as.floatraw(pi)
# [1] db 0f 49 40
# attr(,"class")
# [1] "floatraw"
Or maybe writeBin(pi, raw(8))?
Yes, that must exist in the serialization code because R merrily sends stuff across the wire, taking care of endianness too. Did you look at eg Rserve using it, or how digest passes the char representation to chosen hash functions?
After a quick glance at digest.R:
R> serialize(pi, connection=NULL, ascii=TRUE)
[1] 41 0a 32 0a 31 33 34 39 31 34 0a 31 33 31 38 34 30 0a
[19] 31 34 0a 31 0a 33 2e 31 34 31 35 39 32 36 35 33 35 38
[37] 39 37 39 33 0a
and
R> serialize(pi, connection=NULL, ascii=FALSE)
[1] 58 0a 00 00 00 02 00 02 0f 02 00 02 03 00 00 00 00 0e
[19] 00 00 00 01 40 09 21 fb 54 44 2d 18
R>
That might get you going.
Come to think about it, this includes header meta-data.
The package mcga (machine-coded genetic algorithms) includes some functions for bytes-to-double and doubles-to-byte conversions. For handling the bytes of pi, you can use DoubleToBytes like:
> DoubleToBytes(pi)
1 24 45 68 84 251 33 9 64
For converting bytes to double again, BytesToDouble() can be used instead:
> BytesToDouble(c(24,45,68,84,251,33,9,64))
1 3.141593
Links:
CRAN page of mcga

Resources