How do I prevent R from rounding?
For example,
> a<-893893084082902
> a
[1] 8.93893e+14
I am losing a lot of information there. I have tried signif() and it doesn't seem to do what I want.
Thanks in advance!
(This came up as a result of a student of mine trying to determine how long it would take to count to a quadrillion at a number per second)
It's not rounding; it's just the default format for printing large (or small) numbers.
a <- 893893084082902
> sprintf("%f",a)
[1] "893893084082902.000000"
See the "digits" section of ?options for a global solution.
This would show you more digits for all numbers:
options(digits=15)
Or, if you want it just for a:
print(a, digits=15)
To get around R's integer limits, you could use the gmp package for R: http://cran.r-project.org/web/packages/gmp/index.html
I discovered this package when playing with the Project Euler challenges and needing to do factorizations. But it also provides functions for big integers.
EDIT:
It looks like this question was not really one about big integers as much as it was about rounding. But for the next space traveler who comes this way, here's an example of big integer math with gmp:
Try and multiply 1e500 * 1e500 using base R:
> 1e500 * 1e500
[1] Inf
So to do the same with gmp you first need to create a big integer object which it calls bigz. If you try to pass as.bigz() an int or double of a really big number, it will not work, because the whole reason we're using gmp is because R can't hold a number this big. So we pass it a string. So the following code starts with string manipulation to create the big string:
library(gmp)
o <- paste(rep("0", 500), collapse="")
a <- as.bigz(paste("1", o, sep=""))
mul.bigz(a, a)
You can count the zeros if you're so inclined.
Related
Given two sequences (such as DNA sequences) of equal length, I want to be able to find the mutations between them - both type and index within the sequence. As an example, if I fed in the sequences AGGCTAC and AGCCTTC, I want to get out a list like G3C, A6T.
I can get the number of differences just fine:
seqs <- Biostrings::readAAStringSet("PSE-1_Round20.fas")
nmut <- adist(seqs[1], seqs)
But I can't think of a more elegant way to get the positions than just looping, which seems very kludgy - and I'd like to take this as an opportunity to learn instead.
I'm working with the Biostrings package in R, but there don't seem to be any special tools in that package for what I want to do any I think any solution that works for generic strings should also work for me. In fact, if there's a more elegant solution in python or bash scripting, I'd also accept that.
There seem to be multiple packages that should do this. One is the findMutations function in the adegenet package.
As for the string comparison question, see this question. Here's a function that will work if the strings are the same length:
mutations <- function(str1, str2) {
str1vec <- unlist(strsplit(str1, ""))
str2vec <- unlist(strsplit(str2, ""))
iMut <- (1:length(str1vec))[str1vec != str2vec]
return(paste0(str1vec[iMut], iMut, str2vec[iMut]))
}
> mutations("AGGCTAC", "AGCCTTC")
[1] "G3C" "A6T"
I am currently working with ngrams, which are stored in a data.table in a numeric format, where each word in a vocabulary is given a unique 5 digit number and a single 4-gram looks like this :
10000100001017060484
The reason for storing ngrams in this manner is that numeric objects take up much less space in R. Hence, I am working with some large numbers, which I occasionally need to convert to character and back to do some string manipulation. Today, I noticed that my Rstudio does not seem to store large numbers correctly. For example :
as.numeric(125124313242345145234513234432)
[1] 125124313242345143744028208602
As you can see, the top number is very different from bottom. The only global option I used was:
options(scipen=999)
Can someone explain why is this happening and how can I fix it?
Regards,
Kamran.
If you run .Machine$integer.max, it would return 2147483647 which means R can't by default would handle integer greater than 2147483647. If you run .Machine$double.xmax, you would get a value of 1.797693e+308 which is the maximum double representation of floating number in R.The reasoning could be seen as exponent(308) and significand(1.797...) which are two different sets of storing the numbers.
?.Machine
http://sites.stat.psu.edu/~drh20/R/html/base/html/zMachine.html
In your case if you try to append L (way of telling R that you want to store something like an integer) in the number you will get something like this:
as.numeric(125124313242345145234513234432L)
[1] 1.251243e+29
Warning message:
non-integer value 125124313242345145234513234432L qualified with L; using numeric value
Hence you can see because of this limitations on saving integer and double in R you are getting this outcome.
To overcome this, you can convert it into a bigz using gmp library
as.bigz("125124313242345145234513234432")
Output:
Big Integer ('bigz') :
[1] 125124313242345145234513234432
This is my understanding about storing numbers in R, It might not be perfect but this how I see things in R for storing numbers.
You may choose to see the gmp documentation: https://cran.r-project.org/web/packages/gmp/gmp.pdf
Sorry for making this an answer, but its too long for a comment. What happens if you run below code. On my machine with scipen = 999 your conversion works fine. Have you stored your numbers for the ngrams really as numeric? In below code you may see that a potential error might arise from converting between character and numeric depending on the settings.
mynumber <- 125124313242345145234513234432
options(scipen = 999)
mynumber == as.numeric(mynumber)
#[1] TRUE
mynumber == as.numeric(as.character(mynumber))
#[1] TRUE
options(scipen = 0)
mynumber == as.numeric(mynumber)
#[1] TRUE
mynumber == as.numeric(as.character(mynumber))
#[1] FALSE
I have a number in octal base in string form (see below for an example):
02686a6552f426f08ac0f20ce7dca23e
and I need to transform it into a decimal integer using R. I have tried googling the relevant function, but I could only find functions that either convert decimal to octal such as:
as.octmode()
Or provides conversion between hexadecimal and decimal bases from the fBasics package:
.hex.to.dec
I have hoped for there to be a function line change.base(string,base_from,base_to), but I have only been able to find strtoi with the following arguments:
strtoi("02686a6552f426f08ac0f20ce7dca23e",base=8)
which gives me an NA value and the documentation stating "Convert strings to integers according to the given base using", but it doesn't state whether the base argument specifies the base from which we do the transformation or the one into which we transform (I assume the latter since the example poste above doesn't provide retuls)
It seems that php function decoct() gives a result:
echo octdec(02686a6552f426f08ac0f20ce7dca23e)
5176
But I do not really know php. According to our developers decoct(payment_id) + 3 ) * 7 is the only operation applied to an integer in this case. This is pushed into Google Analytics which provides the result in the example form. I wasn't able to find anything on GA doing this by default.
It would be easy to do the conversion mathematically if I had just the number in octal, but since the format looks like what I assume is some kind of a hash representation of the original number, I am clueless.
I need to run this over hundreds of similar records to compare two data sources so can't really use the php sandbox to do it manually.
Thanks for any help or pointers
The hexadecimal number can be converted into a decimal integer with the Rmpfr package:
library(Rmpfr)
x <- mpfr("02686a6552f426f08ac0f20ce7dca23e", base=16)
#> x
# 1 'mpfr' number of precision 128 bits
#[1] 3200612827992787429417270296251769406
To convert this number into an octal one, the same library can be used:
formatMpfr(x,base=8)
#[1] "23206514524572046741053007440634767121076.000"
I'm running into some problems with the R function as.character() and paste(): they do not give back what they're being fed...
as.character(1415584236544311111)
## [1] "1415584236544311040"
paste(1415584236544311111)
## [1] "1415584236544311040"
what could be the problem or a workaround to paste my number as a string?
update
I found that using the bit64 library allowed me to retain the extra digits I needed with the function as.integer64().
Remember that numbers are stored in a fixed number of bytes based upon the hardware you are running on. Can you show that your very big integer is treated properly by normal arithmetic operations? If not, you're probably trying to store a number to large to store in your R install's integer # of bytes. The number you see is just what could fit.
You could try storing the number as a double which is technically less precise but can store larger numbers in scientific notation.
EDIT
Consider the answers in long/bigint/decimal equivalent datatype in R which list solutions including arbitrary precision packages.
I am reading a csv file with some really big numbers like 1327707999760, but R automatically converts it into 1.32771e+12. I've tried to assign it a double class but it didn't work because it's already a rounded value.
I've checked other posts like Preserving large numbers . People said "It's not in a "1.67E+12 format", it just won't print entirely using the defaults. R is reading it in just fine and the whole number is there." But when I tried to do some arithmetic things on them, it's just not right.
For example:
test[1,8]
[1] 1.32681e+12
test[2,8]
[1] 1.32681e+12
test[2,8]-test[1,8]
[1] 0
But I know they are different numbers!
That's not large. It is merely a representation problem. Try this:
options(digits=22)
options('digits') defaults to 7, which is why you are seeing what you do. All twelve digits are being read and stored, but not printed by default.
Excel allows custom formats: Format/Cells/Custom and enter #0