When I read in data that has a few digits after the decimal place, I notice that R doesn't give me the full value when I ask for it, even when I use round(). Any ideas what could be going on? Here are some example data:
test <- data.frame("A"=c("A", "B", "C", "D"), "B"=c(0.254, 0.457, 0.123, 1.089), "C"=c(101.1, 101.2, 354.1234, 354.1235))
When I enter:
test[1,2]
the output is 0.254. But when I try to see the same thing from the 3rd column, where I have numbers with digits in the hundreds place out to the millionths place (i.e. XXX.XXXX), the output doesn't give me any numbers after the decimal place. For example, when I enter:
test[3,3]
the output is 354.
When I try:
round(test[3,3], digits=4)
I still get 354. But when I subtract like this:
test[3,3]-test[4,3]
the output is 0.0001. What is going on here and how can I see 4 digits after the decimal when I ask for them?
Related
In R I'm using the prettyNum() function to format some numbers but I'm having a hard time getting the number of digits I want. The help docs say:
the desired number of digits after the decimal point (format = "f") or significant digits (format = "g", = "e" or = "fg").
Default: 2 for integer, 4 for real numbers. If less than 0, the C default of 6 digits is used. If specified as more than 50, 50 will be used with a warning unless format = "f" where it is limited to typically 324. (Not more than 15–21 digits need be accurate, depending on the OS and compiler used. This limit is just a precaution against segfaults in the underlying C runtime.)
To me, this seems to mean that prettyNum(404.5142, digits = 2) should give me "404.51" but in reality it produces "405". Can someone explain how to get it to round to a fixed number (say 2) of digits after the decimal place? I'd like it to include tailing 0s too.
The help file for prettyNum is also documenting formatC, to which the parameter digits belongs. The prettyNum function does not have a parameter called digits.
The reason why this doesn't result in an error is that your argument digits is being passed via ... to format.
... arguments passed to format.
In format, the parameter digits is different to the digits parameter in formatC. It means the number of significant digits, not the number of digits after the decimal point. Yes, this is a bit confusing in the documents, but it means for example that you could do:
prettyNum(404.5142, digits = 5)
#> [1] "404.51"
However, this will give you the wrong number of digits if you do, for example:
prettyNum(44.5142, digits = 5)
#> [1] "44.514"
And therefore you would be safer to use something like formatC, which allows
formatC(404.5142, format = "f", digits = 2)
#> [1] "404.51"
and
formatC(44.5142, format = "f", digits = 2)
#> 1] "44.51"
Which seems to be what you are looking for.
Maybe a daft question but why does R remove the significant 0 in the end of a number? For example 1.250 becomes 1.25 which has not the same accuracy. I have been trying to calculate the number of significant digits of a number by using as.character() in combination with gsub() and regular expressions (according to various posts) but i get the wrong result for numbers such as 1.250, since as.character removes the last 0 digit. Therefore the answer for 1.250 comes out as 2 digits rather than 3 which is the correct.
To be more specific why this is an issue for me:
I have long tables in word comprising of bond lengths which are in the format eg: 1.2450(20):
The number in parenthesis is the uncertainty in the measurement which means that the real value is somewhere between 1.2450+0.0020 and 1.2450-0.0020. I have imported all these data from word in a large data frame like so:
df<-data.frame(Activity = c(69790, 201420, 17090),
WN1=c(1.7598, 1.759, 1.760),
WN1sd=c(17, 15, 3))
My aim is to plot the WN1 values against activity but also have the error bar on. This means that i will need to manually convert the WN1sd to: WN1sd=c(0.0017, 0.015, 0.003) which is not the R way to go, hence the need to obtain the number of significant digits of WN1. This works fine for the first two WN1 values but not for the 3rd value since R mistakenly thinks that the last 0 is not significant.
You have to prepare the standard deviations at the time you import your data from your word document
There's a point where you should have strings like that :
"1.2345(89)" "4.230(34)" "3.100(7)"
This is a function you can apply to those chars and get the sd right:
split.mean.sd = function(mean.sd) {
mean <- gsub("(.*)\\(.*", "\\1", mean.sd)
sd <- gsub(".*\\((.*)\\)", "\\1", mean.sd)
digits.after.dot <- nchar(gsub(".*\\.(.*).*", "\\1", mean))
sd <- as.numeric(sd)*10^(-digits.after.dot)
mean <- as.numeric(mean)
c(mean, sd)
}
For example:
v <- c("1.2345(89)","4.230(34)","3.100(7)")
sapply(v, split.mean.sd)
gives you
1.2345(89) 4.230(34) 3.100(7)
[1,] 1.2345 4.230 3.100
[2,] 0.0089 0.034 0.007
Most programming languages, R included, do not track the number of significant digits for floating-point values. This is because in many cases significant digits are not necessary, would significantly slow down computations and require more RAM.
You may want to be interested in some libraries for computations with uncertainties, like the errors (PDF) package.
I need to search for specific information within a set of documents that follows the same standard layout.
After I used grep to find the keywords in every document, I went on collecting the numbers or characters of interest.
One piece of data I have to collect is the Total Power that appears as following:
TotalPower: 986559. (UoPow)
Since I had already correctly selected this excerpt, I created the following function that takes the characters between positions n and m, where n and m start counting up from right to left.
substrRight <- function(x, n,m){
substr(x, nchar(x)-n+1, nchar(x)-m)
}
It's important to say that from the ":" to the number 986559, there are 2 spaces; and from the "." to the "(", there's one space.
So I wrote:
TotalP = substrRight(myDf[i],17,9) [1]
where myDf is a character vector with all the relevant observations.
Line [1], after I loop over all my observations, gives me the numbers I want, but I noticed that when the number was 986559, the result was 98655. It simply doesn't "see" 9 as the last number.
The code seems to work fine for the rest of the data. This number (986559) is indeed the highest number in the data and is the only one with order 10^5 of magnitude.
How can I make sure that I will gather all digits in every number?
Thank you for the help.
We can extract the digits before a . by using regex lookaround
library(stringr)
str_extract(str1, "\\d+(?=\\.)")
#[1] "986559"
The \\d+ indicates one or more digist followed by the regex lookaound .
I am dealing with very precise numbers (maximum number of digits).
I noticed that write.csv(x) in R sometimes round the number.
Has anyone noticed something like that?
What is the default number of digits saved?
As written in the documentation,
In almost all cases the conversion of numeric quantities is governed
by the option "scipen" (see options), but with the internal equivalent
of digits = 15. For finer control, use format to make a character
matrix/data frame, and call write.table on that.
So the simple solution is to change the options, i.e.
options(digits = DESIRED_VALUE)
and the customized solution is to convert your variables to character type with format, e.g.
dat <- mtcars
dat$wt <- format(dat$wt, digits = 20)
and save it like this. Notice however then when using computers we are always dealing with rounded numbers (see Gldberg, 1991, What Every Computer Scientist Should Know About Floating-Point Arithmetic), and you could find tricky outcomes do to the computer precision, e.g.
format(2.620, digits = 20)
## [1] "2.6200000000000001066"
So there is nothing "bad" with rounded values as you probably need them only to be precise up to some number of decimal places. Moreover, your measurements are also affected with measurement errors, so the precision can be illusory.
I need to do a rounding like this and convert it as a character:
as.character(round(5.9999,2))
I expect it to become 6.00, but it just gives me 6
Is there anyway that I can make it show 6.00?
Try either one of these:
> sprintf("%3.2f", round(5.9999, digits=2))
[1] "6.00
> sprintf("%3.2f", 5.999) # no round needed either
[1] "6.00
There are also formatC() and prettyNum().
To help explain what's going on - the round(5.9999, 2) call is rounding your number to the nearest hundredths place, which gives you the number (not string) very close to (or exactly equal to, if you get lucky with floating-point representations) 6.00. Then as.character() looks at that number, takes up to 15 significant digits of it (see ?as.character) in order to represent it to sufficient accuracy, and determines that only 1 significant digit is necessary. So that's what you get.
As Dirk indicated, formatC() is another option.
formatC(x = 5.999, digits = 2, format = 'f')