I'm having troubles with
set.seed(1)
sum(abs(rnorm(100)))
set.seed(1)
cumsum(abs(rnorm(100)))
Why does the value of the sum differ from the last value of the cumulative sum with the cumulative sum preserving the all decimal digits and sum rounding 1 digit off.
Also note that this really really is about how values are printed i.e. presented. This does not change the values themselves, e.g. ...
set.seed(1)
d1 <- sum(abs(rnorm(100)))
set.seed(1)
d2 <- cumsum(abs(rnorm(100)))
(d1 == d2)[100]
## [1] TRUE
This is a consequence of the way R prints atomic vectors.
With the default digits option set to 7 as you likely have, any value between -1 and 1 will print with seven decimal places. Because of the way R prints atomic vectors, all other values in the vector will also have seven decimal places. Furthermore, a value of .6264538 with digits option set to 7 must print with eight digits (0.6264538) because it must have a leading zero. There are two of these values in your rnorm() vector.
If you look at cumsum(abs(rnorm(100)))[100] alone and you can see the difference (actually it becomes the same as printed value as sum(abs(rnorm(100))), although not exactly the same value).
sum(abs(rnorm(100)))
# [1] 71.67207
cumsum(abs(rnorm(100)))[100]
# [1] 71.67207
Notice that both of these values have seven digits. Probably the most basic example of this I can think of is as follows
0.123456789
#[1] 0.1234568
1.123456789
#[1] 1.123457
11.123456789
# [1] 11.12346
## and so on ...
Related
I am wondering if converting numerics to characters and then back again in R can ever change the number? For example, does as.character round off numerics after a certain amount of decimal places (if so, how many)?
#jogo thanks for the suggestion :)
Here is the comment as an answer:
From ?as.character():
as.character represents real and complex numbers to 15 significant
digits (technically the compiler's setting of the ISO C constant
DBL_DIG, which will be 15 on machines supporting IEC60559 arithmetic
according to the C99 standard). This ensures that all the digits in
the result will be reliable (and not the result of representation
error), but does mean that conversion to character and back to numeric
may change the number. If you want to convert numbers to character
with the maximum possible precision, use format.
So yes it does change the number if you have more than 15 significant digits. See:
> as.character(1.000000000000001) # more than 15 significant digits
[1] "1"
> as.character(1.00000000000001) # less than 15 significant digits
[1] "1.00000000000001"
Here are some other examples:
y <- as.numeric(as.character(pi))
identical(y, pi) ### gives FALSE
or
x <- 1/7
y <- as.numeric(as.character(x))
x-y
or
as.numeric(as.character(.Machine$double.xmax)) ## result: Inf
Today I had a look at the pop dataset of the wpp2019 package and noticed that the population numbers are shown as numeric values with a "." after the last three digits (e.g. 10500 is 10.500).
library(wpp2019)
data("pop")
pop$`2020`
To remove the dots, I would usually simply turn the column into a character column and then use for example stringr::str_replace(), but as soon as I apply any function (except printing) to the population number columns, the dots disappear.
How can it be that this dataset shows e.g. 10.500 when printing the data.frame even though R usually removes the 0 digits after the dot for numeric values? And what would be the best way to remove the dots in the above example without losing the 0 digits?
Expected output
# instead of
pop$`2020`[153]
#[1] 164.1
# this value should return 164100 because printing the data frame
# shows 164.100
Population estimates in wpp2019 are given in thousands. So multiply by 1000 to get back to the estimated number of individuals:
> pop$`2020`[153]*1000
[1] 164100
R prints the decimal part sometimes but not other times based on the digits argument in print, and what else is in the vector it is printing. For example:
> print(1234567.890)
[1] 1234568 # max 7 digits printed by default
> print(c(1234567.890,0.011))
[1] 1234567.890 0.011 # but when printed alongside 0.011 all the digits shown.
This explains why your data frame always shows all the digits but you don't see all the digits when you extract individual numbers.
I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic
In one column of a data frame, I have values for longitude. For example:
df<-data.frame(long=c(-169.42000,144.80000,7.41139,-63.07000,-62.21000,14.48333,56.99900))
I want to keep rows which have at least three decimal places (i.e three non-zero values immediately after the decimal point) and delete all others. So rows 1,2,4 and 5 would be deleted from df in the example above.
So far I've tried usinggrep to extract the rows I want to keep:
new.df<-df[-grep("000$",df$long),]
However this has deleted all rows. Any ideas? I'm new to using grep so there may be glaring errors that I've not picked up on!
Many thanks!
I wouldn't use regex for this.
tol <- .Machine$double.eps ^ 0.5
#use tol <- 0.001 to get the same result as with the regex for numbers like 0.9901
discard <- df$long-trunc(df$long*100)/100 < tol
df[!discard, , drop=FALSE]
# long
# 3 7.41139
# 6 14.48333
# 7 56.99900
You have to modify your regular expression slightly. The following one select all values with three non-zero numbers after the decimal point:
new.df <- df[grep("\\.[1-9][1-9][1-9]", df$long), ]
I want to get all different float values in a sample:
unique(c(0.100000000002, 0.100000000003))
But this only returns two times 0.1 which are not unique values:
[1] 0.1 0.1
How can I list the exact values that are saved?
That's just R's default printing limit of 7 significant figures. To see the true underlying values:
print(unique(c(0.100000000002, 0.100000000003), digits=15)
To change the default behaviour, see ?options; you want something like options(digits=15).
Use sprintf...
x <- unique(c(0.100000000002, 0.100000000003))
sprintf("%.20f", x)
#[1] "0.10000000000200000294" "0.10000000000299999470"
From the help page for sprintf...
f
Double precision value, in “fixed point” decimal notation of the
form "[-]mmm.ddd". The number of decimal places ("d") is specified by
the precision: the default is 6; a precision of 0 suppresses the
decimal point. Non-finite values are converted to NA, NaN or (perhaps
a sign followed by) Inf.
Here you go
options(digits=14)
unique(c(0.100000000002, 0.100000000003))