I have a number in octal base in string form (see below for an example):
02686a6552f426f08ac0f20ce7dca23e
and I need to transform it into a decimal integer using R. I have tried googling the relevant function, but I could only find functions that either convert decimal to octal such as:
as.octmode()
Or provides conversion between hexadecimal and decimal bases from the fBasics package:
.hex.to.dec
I have hoped for there to be a function line change.base(string,base_from,base_to), but I have only been able to find strtoi with the following arguments:
strtoi("02686a6552f426f08ac0f20ce7dca23e",base=8)
which gives me an NA value and the documentation stating "Convert strings to integers according to the given base using", but it doesn't state whether the base argument specifies the base from which we do the transformation or the one into which we transform (I assume the latter since the example poste above doesn't provide retuls)
It seems that php function decoct() gives a result:
echo octdec(02686a6552f426f08ac0f20ce7dca23e)
5176
But I do not really know php. According to our developers decoct(payment_id) + 3 ) * 7 is the only operation applied to an integer in this case. This is pushed into Google Analytics which provides the result in the example form. I wasn't able to find anything on GA doing this by default.
It would be easy to do the conversion mathematically if I had just the number in octal, but since the format looks like what I assume is some kind of a hash representation of the original number, I am clueless.
I need to run this over hundreds of similar records to compare two data sources so can't really use the php sandbox to do it manually.
Thanks for any help or pointers
The hexadecimal number can be converted into a decimal integer with the Rmpfr package:
library(Rmpfr)
x <- mpfr("02686a6552f426f08ac0f20ce7dca23e", base=16)
#> x
# 1 'mpfr' number of precision 128 bits
#[1] 3200612827992787429417270296251769406
To convert this number into an octal one, the same library can be used:
formatMpfr(x,base=8)
#[1] "23206514524572046741053007440634767121076.000"
Related
I have an exponential value for e.g. 3.22122E+23
In Marklogic when I try- xs:decimal(3.22122E+23)
I get this error:
[1.0-ml] XDMP-CAST: (err:FORG0001) xs:decimal(xs:double("3.22122E23")) -- Invalid cast: xs:double("3.22122E23") cast as xs:decimal
A lower value for e.g. xs:decimal(3.22122E+18) gives me the correct result i.e. 3221220000000000000.
I see that this is because of decimal overflow and cannot be represented as a decimal data type but is there any way in Marklogic to handle and calculate such huge values?
Same question applies for the negative values(3.22122E-23) where I can handle and display data above 20 decimal places.
It would be helpful to clarify what kind of logic or calculations you are trying to accomplish and why exactly you need to convert the value to decimal. For example, to "display" the double value, you can use the standard format-number function without any conversion to decimal:
let $x := xs:double(3.22122E+23)
return format-number($x,"#,##0.00")
yields:
322,122,000,000,000,000,000,000.00
See https://docs.marklogic.com/fn:format-number for details regarding fn:format-number() usage.
See https://help.marklogic.com/Knowledgebase/Article/View/487/0/marklogic-server-and-the-decimal-type-implementation for details of the limitations of the xs:decimal type.
I am currently working with ngrams, which are stored in a data.table in a numeric format, where each word in a vocabulary is given a unique 5 digit number and a single 4-gram looks like this :
10000100001017060484
The reason for storing ngrams in this manner is that numeric objects take up much less space in R. Hence, I am working with some large numbers, which I occasionally need to convert to character and back to do some string manipulation. Today, I noticed that my Rstudio does not seem to store large numbers correctly. For example :
as.numeric(125124313242345145234513234432)
[1] 125124313242345143744028208602
As you can see, the top number is very different from bottom. The only global option I used was:
options(scipen=999)
Can someone explain why is this happening and how can I fix it?
Regards,
Kamran.
If you run .Machine$integer.max, it would return 2147483647 which means R can't by default would handle integer greater than 2147483647. If you run .Machine$double.xmax, you would get a value of 1.797693e+308 which is the maximum double representation of floating number in R.The reasoning could be seen as exponent(308) and significand(1.797...) which are two different sets of storing the numbers.
?.Machine
http://sites.stat.psu.edu/~drh20/R/html/base/html/zMachine.html
In your case if you try to append L (way of telling R that you want to store something like an integer) in the number you will get something like this:
as.numeric(125124313242345145234513234432L)
[1] 1.251243e+29
Warning message:
non-integer value 125124313242345145234513234432L qualified with L; using numeric value
Hence you can see because of this limitations on saving integer and double in R you are getting this outcome.
To overcome this, you can convert it into a bigz using gmp library
as.bigz("125124313242345145234513234432")
Output:
Big Integer ('bigz') :
[1] 125124313242345145234513234432
This is my understanding about storing numbers in R, It might not be perfect but this how I see things in R for storing numbers.
You may choose to see the gmp documentation: https://cran.r-project.org/web/packages/gmp/gmp.pdf
Sorry for making this an answer, but its too long for a comment. What happens if you run below code. On my machine with scipen = 999 your conversion works fine. Have you stored your numbers for the ngrams really as numeric? In below code you may see that a potential error might arise from converting between character and numeric depending on the settings.
mynumber <- 125124313242345145234513234432
options(scipen = 999)
mynumber == as.numeric(mynumber)
#[1] TRUE
mynumber == as.numeric(as.character(mynumber))
#[1] TRUE
options(scipen = 0)
mynumber == as.numeric(mynumber)
#[1] TRUE
mynumber == as.numeric(as.character(mynumber))
#[1] FALSE
I'm running into some problems with the R function as.character() and paste(): they do not give back what they're being fed...
as.character(1415584236544311111)
## [1] "1415584236544311040"
paste(1415584236544311111)
## [1] "1415584236544311040"
what could be the problem or a workaround to paste my number as a string?
update
I found that using the bit64 library allowed me to retain the extra digits I needed with the function as.integer64().
Remember that numbers are stored in a fixed number of bytes based upon the hardware you are running on. Can you show that your very big integer is treated properly by normal arithmetic operations? If not, you're probably trying to store a number to large to store in your R install's integer # of bytes. The number you see is just what could fit.
You could try storing the number as a double which is technically less precise but can store larger numbers in scientific notation.
EDIT
Consider the answers in long/bigint/decimal equivalent datatype in R which list solutions including arbitrary precision packages.
On loading a yaml file with values such as 25.0, the .0 is ignored and what I get is 25. Is it possible to force yaml to consider the value as it is without manipulating the data? I have tried enclosing the values in single/double quotes, but that does not work.
[Edit]: I am using the yaml parser package for R programming language. The data type returned is double. If I set the value to 25.2, I get back the same value. How can I force YAML/R to read the the information in YAML as it is.
Your problem is that the parser recognises that these are floating point numbers and in R there is no difference between 25.0 and 25. Try this for example:
identical(25.0, 25)
25.0 and 25 are just two different representations of the same floating point number. If you want to retain the form in which the data is supplied you will have to read them in as strings (which you can later convert to numeric if you need to perform calculations). You can do this with a handler:
yaml.load("25.0", handlers=list("float#fix"=function(x) as.character(x)))
Maybe this will help: http://tolstoy.newcastle.edu.au/R/help/06/05/28016.html
Its suggested to change the settings for digits and possibly round the numbers too to avoid too many decimal places.
options(digits=2)
format(rounf(x, 2), nsmall = 2)
How do I prevent R from rounding?
For example,
> a<-893893084082902
> a
[1] 8.93893e+14
I am losing a lot of information there. I have tried signif() and it doesn't seem to do what I want.
Thanks in advance!
(This came up as a result of a student of mine trying to determine how long it would take to count to a quadrillion at a number per second)
It's not rounding; it's just the default format for printing large (or small) numbers.
a <- 893893084082902
> sprintf("%f",a)
[1] "893893084082902.000000"
See the "digits" section of ?options for a global solution.
This would show you more digits for all numbers:
options(digits=15)
Or, if you want it just for a:
print(a, digits=15)
To get around R's integer limits, you could use the gmp package for R: http://cran.r-project.org/web/packages/gmp/index.html
I discovered this package when playing with the Project Euler challenges and needing to do factorizations. But it also provides functions for big integers.
EDIT:
It looks like this question was not really one about big integers as much as it was about rounding. But for the next space traveler who comes this way, here's an example of big integer math with gmp:
Try and multiply 1e500 * 1e500 using base R:
> 1e500 * 1e500
[1] Inf
So to do the same with gmp you first need to create a big integer object which it calls bigz. If you try to pass as.bigz() an int or double of a really big number, it will not work, because the whole reason we're using gmp is because R can't hold a number this big. So we pass it a string. So the following code starts with string manipulation to create the big string:
library(gmp)
o <- paste(rep("0", 500), collapse="")
a <- as.bigz(paste("1", o, sep=""))
mul.bigz(a, a)
You can count the zeros if you're so inclined.