I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic
I am dealing with very precise numbers (maximum number of digits).
I noticed that write.csv(x) in R sometimes round the number.
Has anyone noticed something like that?
What is the default number of digits saved?
As written in the documentation,
In almost all cases the conversion of numeric quantities is governed
by the option "scipen" (see options), but with the internal equivalent
of digits = 15. For finer control, use format to make a character
matrix/data frame, and call write.table on that.
So the simple solution is to change the options, i.e.
options(digits = DESIRED_VALUE)
and the customized solution is to convert your variables to character type with format, e.g.
dat <- mtcars
dat$wt <- format(dat$wt, digits = 20)
and save it like this. Notice however then when using computers we are always dealing with rounded numbers (see Gldberg, 1991, What Every Computer Scientist Should Know About Floating-Point Arithmetic), and you could find tricky outcomes do to the computer precision, e.g.
format(2.620, digits = 20)
## [1] "2.6200000000000001066"
So there is nothing "bad" with rounded values as you probably need them only to be precise up to some number of decimal places. Moreover, your measurements are also affected with measurement errors, so the precision can be illusory.
There is an option in R to get control over digit display. For example:
options(digits=10)
is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:
digits: controls the number of digits
to print when printing numeric values.
It is a suggestion only. Valid values
are 1...22 with default 7
So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?
My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?
Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave
[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"
which has 48 decimals. Is this the maximum limit R can handle?
The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.
As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.
Mathematica or Maple will allow you to give as many decimal places as your heart desires.
EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.
On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).
Compare these two analyses.
x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1) #Should throw an error
x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2) #ok
In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.
As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.
mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")
These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.
If you are producing the entire output yourself, you can use sprintf(), e.g.
> sprintf("%.10f",0.25)
[1] "0.2500000000"
specifies that you want to format a floating point number with ten decimal points (in %.10f the f is for float and the .10 specifies ten decimal points).
I don't know of any way of forcing R's higher level functions to print an exact number of digits.
Displaying 100 digits does not make sense if you are printing R's usual numbers, since the best accuracy you can get using 64-bit doubles is around 16 decimal digits (look at .Machine$double.eps on your system). The remaining digits will just be junk.
One more solution able to control the how many decimal digits to print out based on needs (if you don't want to print redundant zero(s))
For example, if you have a vector as elements and would like to get sum of it
elements <- c(-1e-05, -2e-04, -3e-03, -4e-02, -5e-01, -6e+00, -7e+01, -8e+02)
sum(elements)
## -876.5432
Apparently, the last digital as 1 been truncated, the ideal result should be -876.54321, but if set as fixed printing decimal option, e.g sprintf("%.10f", sum(elements)), redundant zero(s) generate as -876.5432100000
Following the tutorial here: printing decimal numbers, if able to identify how many decimal digits in the certain numeric number, like here in -876.54321, there are 5 decimal digits need to print, then we can set up a parameter for format function as below:
decimal_length <- 5
formatC(sum(elements), format = "f", digits = decimal_length)
## -876.54321
We can change the decimal_length based on each time query, so it can satisfy different decimal printing requirement.
If you work primarily with tibbles, there is a function that enforces digits: num().
Here is an example:
library(tidyverse)
data <- tribble(
~ weight, ~ weight_selfreport,
81.5,81.66969147005445,
72.6,72.59528130671505,
92.9,93.01270417422867,
79.4,79.4010889292196,
94.6,96.64246823956442,
80.2,79.4010889292196,
116.2,113.43012704174228,
95.4,95.73502722323049,
99.5,99.8185117967332
)
data <-
data %>%
mutate(across(where(is.numeric), ~ num(., digits = 3)))
data
#> # A tibble: 9 × 2
#> weight weight_selfreport
#> <num:.3!> <num:.3!>
#> 1 81.500 81.670
#> 2 72.600 72.595
#> 3 92.900 93.013
#> 4 79.400 79.401
#> 5 94.600 96.642
#> 6 80.200 79.401
#> 7 116.200 113.430
#> 8 95.400 95.735
#> 9 99.500 99.819
Thus you can even decide to have different rounding options depending on what your needs are. I find it very helpful and a rather quick solution to printing dfs.
I'm a bit confused with e notations and small negative numbers.
I understand that e means 10^exponent
like 6e5 is equal to 610^5 = 600000
and 6e-5 is equal to 610^-5 = 0.00006
But lately I found some configuration files that consist of numbers like:
1.215e-011
1.33e-002
7.20e-004
so how would I go with them?
I understand that the sign shows the order of magnitude, if its positive or negative, but what about the number behind the sign? It starts with a zero. So is the zero ignored or is the number smaller than zero?
So what I would like to know is which would be the correct way if my example number is 6e-005:
Way 1: 6e-005 = 6 * -10^-5 = 0.00006
Way 2: 6e-005 = 6 * 10^-0.005 = 5.93131856794
which is the correct approach? or is there a third way? Thanks!
Just ignore the leading zeros. 6e-005 == 6e-5.
They are sometimes used so that all numbers in a context have a fixed format.
The format is padded with zeros to a fixed with of three digits, so "Way 1" is the correct interpretation.
I would like the output of my R console to look readable. To this end, I would like R to round all my numbers to the nearest N decimal places. I have some success but it doesn't work completely:
> options(scipen=100, digits=4)
> .000000001
[1] 0.000000001
> .1
[1] 0.1
> 1.23123123123
[1] 1.231
I would like the 0.000000001 to be displayed as simply 0. How does one do this? Let me be more specific: I would like a global fix for the entire R session. I realize I can start modifying things by rounding them but it's less helpful than simply setting things for the entire session.
Look at ?options, specifically the digits and scipen options.
try
sprintf("%.4f", 0.00000001)
[1] "0.0000"
Combine what Greg Snow and Ricardo Saporta already gave you to get the right answer: options('scipen'=+20) and options('digits'=2) , combined with round(x,4) .
round(x,4) will round small near-zero quantities.
Either you round off the results of your regression once and store it:
x <- round(x, 4)
... or else yes, you have to do that every time you display the small quantity, if you don't want to store its rounded value. In your case, since you said small near-zero quantities effectively represent zero, why don't you just round it?
If for some reason you need to keep both the precise and the rounded versions, then do.
x.rounded <- round(x, 4)