format of rounded large numbers in R - r

Starting with R (3.3.1 64bit on Windows) I found out that mean() provides too many fractional digits, so I used round(x, 1) to trim those. While that works for smaller numbers, somewhat larger numbers are output in a strange format that does not obey the rounding rules (IMHO):
I see an output of 1.330710e+04. Obviously that number should be 13307.1; in the format shown, there are actually two fractional digits displayed.
Is there a way to get more beautiful formatting? Did I make a mistake?

> format(round(345678998766.01))
[1] "3.45679e+11"
> format(round(345678998766.01), digits = 10)
[1] "345678998766"
> format(round(mean(c(345678998766.01, 345678998766.01))))
[1] "3.45679e+11"
> format(round(mean(c(345678998766.01, 345678998766.01))), digits = 10)
[1] "345678998766"

Related

Why does R behave like this?

I use the following multiplication in R (v. R-3.6.1): 115*1.044. I get 120.1. In Excel I get 120.06. By hand I get 120.062.
I select use options(digits=4) in R, but I still get the same result: 120.1.
Why does R behave like this? I use to trust it more than Excel, but it seems that here Excel is more accurate in what it returns. Is there a way to force R to return the accurate digits I would get if multiplying by hand?
The function format has the digits option referred to the total digits of the number considered as a whole (integer and decimal part):
> format(115*1.044, digits = 5)
[1] "120.06"
> format(115*1.044, digits = 4)
[1] "120.1"

R include commas and show max dp for numbers

is there any way for me to include commas for large numbers example show 1000000 as 1,000,000 and at the same time display the max number of decimals for each. I've looked through some of the questions asked, doesnt seem to have an option to do both. I tried
format(1000000, big.mark = ",")
which tends to round off the numbers. and if i include the nsmall option, it changes the dp for all the values. So the ideal output i want for a column of numbers is to show the max decimals if they have any and not show any if they dont. So it looks something like this:
1000000 -> 1,000,000
10043.9658 - > 10,043.9658
5005.3 -> 5,000.3
As you can see above, it doesnt show decimal if it doesnt have any and shows the max decimals if it has any to begin with.
You can use sapply() with format() ensuring the digits argument is set to a suitable minimum width and the scientific notation argument is set to FALSE.
sapply(c(1000000, 10043.9658, 5005.3), format, big.mark = ",", digits = 12, scientific = FALSE)
[1] "1,000,000" "10,043.9658" "5,005.3"

R read excel file numeric precision problem

I have a number in an excel file that is equal to -29998,1500000003
When I try to open it in R I get
> library(openxlsx)
> posotest <- as.character(read.xlsx("sofile.xlsx"))
> posotest
[1] "-29998.1500000004"
Any help? Desired result: -29998,1500000003
EDIT: with options(digits=13) I get -29998.150000000373 which could explain why the rounding is done, however even with options(digits=13) I get
> as.character(posotest)
[1] "-29998.1500000004"
Do you have any function that would allow me to get the full number in characters?
EDIT2 format does this but it adds artificial noise at the end.
x <- -29998.150000000373
format(x,digits=22)
[1] "-29998.15000000037252903"
How can I know how many digits to use in format since nchar will give me a wrong value?
The file is here
You can get a string with up to 22 digits of precision via format():
x <- -29998.150000000373
format(x,digits=22)
[1] "-29998.15000000037252903"
Of course, this will show you all sorts of ugliness related to trying to represent a decimal number in a binary representation with finite precision ...

Long Numbers As A Character String

As part of my dataset, one of the columns is a series of 24-digit numbers.
Example:
bigonumber <- 429382748394831049284934
When I import it using either data.table::fread or read.csv, it shows up as numeric in exponential format (EG: 4.293827e+23).
options(digits=...) won't work since the number is longer than 22 digits.
When I do
as.character(bigonumber)
what I get is "4.29382748394831e+23"
Is there a way to get bigonumber converted to a character string and show all of the digits as characters? I don't need to do any math on it, but I do need to search against it and do dplyr joins on it.
I need to this after import, since the column number varies from month to month.
(Yes, in the perfect world, my upstream data provider would use a hash instead of a long number and a static number of columns that stay the same every month, but I don't get to dictate that to them.)
You can specify colClasses on your fread or read.csv statement.
bignums
429382748394831049284934
429382748394831049284935
429382748394831049284936
429382748394831049284937
429382748394831049284938
429382748394831049284939
bignums <- read.csv("~/Desktop/bignums.txt", sep="", colClasses = 'character')
You can suppress the scientific notation with
options(scipen=999)
If you define the number then
bigonumber <- 429382748394831049284934
you can convert it into a string:
big.o.string <- as.character(bigonumber)
Unfortunately, this does not work because R converts the number to a double, thereby losing precision:
#[1] "429382748394831019507712"
The last digits are not preserved, as pointed out by #SabDeM. Even setting
options(digits=22)
doesn't help, and in any case 22 is the largest number that is allowed; and in your case there are 24 digits. So it seems that you will have to read the data directly as character or factor. Great answers have been posted showing how this can be achieved.
As a side note, there is a package called gmp that allows using arbitrarily large integer numbers. However, there is a catch: they have to be read as characters (again, in order to prevent R's internal conversion into double).
library(gmp)
bigonumber <- as.bigz("429382748394831049284934")
> bigonumber
Big Integer ('bigz') :
[1] 429382748394831049284934
> class(bigonumber)
[1] "bigz"
The advantage is that you can indeed treat these entries as numbers and perform calculations while preserving all the digits.
> bigonumber * 2
#Big Integer ('bigz') :
#[1] 858765496789662098569868
This package and my answer here may not solve your problem, because reading the numbers directly as characters is an easier way to achieve your goal, but I thought I might post this anyway as an information for users who may need to use large integers with more than 22 digits.
Use digest::digest on bigonumber to generate an md5 hash of the number yourself?
bigonumber <- 429382748394831049284934
hash_big <- digest::digest(bigonumber)
hash_big
# "e47e7d8a9e1b7d74af6a492bf4f27193"
I saw this before I posted my answer, but dont see it here anymore.
set options(scipen) to a big value so that there is no truncation:
options(scipen = 999)
bigonumber <- 429382748394831049284934
bigonumber
# [1] 429382748394831019507712
as.character(bigonumber)
# [1] "429382748394831019507712"
Use "scan" to read the file - the "what" parameter lets you define the input type of each column.
If you want numbers as numbers you can't print all values. The digits options allows a maximum of 22 digits. The range is from 1 to 22. It uses the print.default method. You can set it with:
options( digits = 22 )
Even with this options, the numbers will change. I ignore why that happens, most likely due to the fact that the object your are about to print (the number) is longer than the allowed amount of digits and so R does some weird stuff. I'll investigate about it.

knitr - strange behaviour for digits

I ran into some trouble concerning the number of digits printed in knitr.
The number does not correspond to the settings [options('digits')].
I know that it was an issue with that about a year ago but has been resolved (https://github.com/yihui/knitr/issues/120).
```{r}
packageVersion("knitr")
options("digits")
a <- 100.101
a
as.character(a)
options(digits=4)
a
options(digits=10)
a
```
This is what I get (the same on two different machines): http://rpubs.com/markheckmann/6715 .
Something is going wrong here and I do not have a clue. Any ideas?
I don't think options(digits=10) is doing what you exepct. Perhaps you meant
sprintf( "%.10f",101.101)
# [1] "101.1010000000"
This isn't a knitr issue; it's just how R displays digits. Try your code on its own, without knitting.
a <- 100.101
a
#[1] 100.101
as.character(a)
#[1] "100.101"
options(digits=4)
a
#[1] 100.1
options(digits=10)
a
[1] 100.101
print doesn't pad numbers with zeroes to make up the width; for that you need format.
format(a, nsmall = 10)
#[1] "100.1010000000"

Resources