How to round a number and make it show zeros? - r

The common code in R for rounding a number to say 2 decimal points is:
> a = 14.1234
> round(a, digits=2)
> a
> 14.12
However if the number has zeros as the first two decimal digits, R suppresses zeros in display:
> a = 14.0034
> round(a, digits=2)
> a
> 14
How can we make R to show first decimal digits even when they are zeros? I especially need this in plots. I've searched here and some people have suggested using options(digits=2), but this makes R to have a weird behavior.

We can use format
format(round(a), nsmall = 2)
#[1] "14.00"
As #arvi1000 mentioned in the comments, we may need to specify the digits in round
format(round(a, digits=2), nsmall = 2)
data
a <- 14.0034

Try this:
a = 14.0034
sprintf('%.2f',a) # 2 digits after decimal
# [1] "14.00"

The formatC function works nicely if you apply it to the vector after rounding. Here the inner function round rounds to two decimal places then the outer function formatC formats to the same number of decimal places as the number were rounded to. This essentially re-adds zeros to the number that would otherwise end without the decimal places (e.g., 14.0034 is rounded to 14, which becomes 14.00).
a=c(14.0034, 14.0056)
formatC(round(a,2),2,format="f")
#[1] "14.00", "14.01"

You can use this function instead of round and just use it like you use round function.
import decimal
def printf(x, n):
d = decimal.Decimal(str(x))
d0 = -(d.as_tuple().exponent)
if d0 < n:
print("x = ", x)
else:
d1 = decimal.Decimal(str(round(x, n)))
d2 = d1.as_tuple().exponent
MAX = n + d2
if MAX == 0:
print("x = ", round(x, n))
else:
i = 0
print("x = ", round(x, n), end = '')
while i != MAX:
if i == (MAX - 1):
print("0")
else:
print("0", end = '')
i = i + 1
So you must have something like this.
>>> printf(0.500000000000001, 13)
>>> 0.5000000000000

Related

Avoiding rounding with formatC

I am using formatC to ensure that a bunch of numbers are all printed to the same length. Some numbers are shorter than the desired length and padded with 0s, and some are longer and truncated. The issue is that formatC rounds in the last digit.
This is fine
> formatC(1, digits = 5, format = 'f')
[1] "1.00000"
I do not like the rounding, I would rather truncate it at the nth digit without rounding.
> formatC(1.234567, digits = 5, format = 'f')
[1] "1.23457"
Is there a way to truncate numbers without rounding in R? I understand that it could be possible to first convert to character and then grab a certain substring of that, but that feels clunky.
It's a little hacky, but you can use trunc with a little multiplication:
trunc(1.234567 * 1e5) / 1e5
# [1] 1.23456
Functionalize it:
trunc2 = function(x, d) trunc(x * 10 ^ d) / 10 ^ d
Then you can
formatC(trunc2(1.234567, 5), digits = 5, format = 'f')
# [1] "1.23456"

How do I pre-determine mutually exclusive comparisons?

The human eye can see that no value x satisfies the condition
x<1 & x>2
but how can I make R see that. I want to use this in a function which gets passed comparisons (say as strings) and not necessarily data. Let's say I want to write a function that checks whether a combination of comparisons can ever be fulfilled anyway, like this
areTherePossibleValues <- function(someString){
someCode
}
areTherePossibleValues("x<1 & x>2")
[1] FALSE
I mean one could do that by interpreting the substrings that are comparison signs and so on, but I feel like there's got to be a better way. The R comparison functions ('<','>','=' and so on) themselves actually might be the answer to this, right?
Another option is to use the library validatetools (disclaimer, I'm its author).
library(validatetools)
rules <- validator( r1 = x < 1, r2 = x > 2)
is_infeasible(rules)
# [1] TRUE
make_feasible(rules)
# Dropping rule(s): "r1"
# Object of class 'validator' with 1 elements:
# r2: x > 2
# Rules are evaluated using locally defined options
# create a set of rules that all must hold:
rules <- validator( x > 1, x < 2, x < 2.5)
is_infeasible(rules)
# [1] FALSE
remove_redundancy(rules)
# Object of class 'validator' with 2 elements:
# V1: x > 1
# V2: x < 2
rules <- validator( x >= 1, x < 1)
is_infeasible(rules)
# [1] TRUE
To compare among ranges, min of the range max(s) should always be greater than the max of the range min(s), showed as below:
library(dplyr)
library(stringr)
areTherePossibleValues <- function(s) {
str_split(s, pattern = " *& *", simplify = TRUE)[1, ] %>%
{lapply(c("max" = "<", "min" = ">"), function(x) str_subset(., pattern = x) %>% str_extract(., pattern = "[0-9]+"))} %>%
{as.numeric(min(.$max)) > as.numeric(max(.$min))}
}
Update: add inclusion comparison
The only difference is that min of the range max(s) can be equal to the max of the range min(s).
library(dplyr)
library(stringr)
areTherePossibleValues <- function(s) {
str_split(s, pattern = " *& *", simplify = TRUE)[1, ] %>%
{lapply(c("max" = "<", "min" = ">"), function(x) str_subset(., pattern = x) %>% str_remove(., pattern = paste0("^.*", x)))} %>%
{ifelse(sum(grepl(pattern = "=", unlist(.))),
as.numeric(min(str_remove(.$max, "="))) >= as.numeric(max(str_remove(.$min, "="))),
as.numeric(min(.$max)) > as.numeric(max(.$min)))}
}
areTherePossibleValues("x<1 & x>2")
areTherePossibleValues("x>1 & x<2")
areTherePossibleValues("x>=1 & x<1")
Here is my way of solving it, it may not be the best, but it should work even you have many comparisons.
Let's call the numbers appeared in your comparisons 'cutoffs', then all we need to do is to test 1 number between each pair of cutoffs, 1 number that is larger than the max cutoff, and 1 number that is smaller than the min cutoff.
The intuition is illustrated with the plot:
Here is the code:
areTherePossibleValues <- function(s){
# first get the numbers that appeared in your string, sort them, and call them the cutoffs
cutoffs = sort(as.numeric(gsub("\\D", "", strsplit(s, "&")[[1]])))
# get the numbers that in between each cutoffs, and a bit larger/smaller than the max/min in the cutoffs
testers = (c(min(cutoffs)-1, cutoffs) + c( cutoffs ,max(cutoffs) + 1))/2
# take out each comparisons
comparisons = strsplit(s, "&")[[1]]
# check if ANY testers statisfy all comparisons
any(sapply(testers, function(te){
# check if a test statisfy ALL comparisons
all(sapply(comparisons, function(co){eval(parse(text =gsub(pattern = 'x',replacement =te, co)))}))
}))
}
areTherePossibleValues("x<1 & x>2")
#[1] FALSE
areTherePossibleValues("x>1 & x<2 & x < 2.5")
#[1] TRUE
areTherePossibleValues("x=> 1 & x < 1")
#[1] FALSE
We see x<1 & x>2 is impossible because we are taught a simple rule: if a number x is smaller than another number a then it can not be bigger than another number that is bigger than a, or more fundamentally we are using the transitivity property of any partially ordered set. There is no reason we can not teach a computer (or R) to see that. If your logic string in your question only consists of statements in the forms x # a where # can be <, >, <=, and >=, and the operator is always &, then Yue Y's solution above perfectly answers your question. It can be even generalized to include the | operator. Beyond this you'll have to be more specific what the logic expression can be.

Round a value dependent on the first significant figure of the uncertainty

I am wondering how I would round a value dependent on the values uncertainty.
For example:
If the value is 0.2563 and the uncertainty on this value is 0.007423. I would like to round the value to 0.256+/-0.007.
I think you can try
x=0.2563
y=0.007423
paste0(round(x,digits = 3),"+-",round(y,digits = 3))
#[1] "0.256+-0.007
Or with plus-minus character:
paste0(round(x,digits = 3),"\u00B1",round(y,digits = 3))
#[1] "0.256±0.007"
What about a simple:
val <- x + c(-1,1)*y
val
[1] 0.248877 0.263723
round(val, digits = 4)
[1] 0.2489 0.2637
Or another way could be:
x + c(-1, 1)* round(y, 3)
[1] 0.2493 0.2633
I believe it depends on what level of precision do you need and what elements (x or y) can be or cannot be rounded.

Splitting a number in R

In R I have a number, say 1293828893, called x.
I wish to split this number so as to remove the middle 4 digits 3828 and return them, pseudocode is as follows:
splitnum <- function(number){
#check number is 10 digits
if(nchar(number) != 10){
stop("number not of right size");
}
middlebits <- middle 4 digits of number
return(middlebits);
}
This is a pretty simple question but the only solutions I have found apply to character strings, rather than numeric ones.
If of interest, I am trying to create an implementation in R of the Middle-square method, but this step is particularly tricky.
You can use substr(). See its help page ?substr. In your function I would do:
splitnum <- function(number){
#check number is 10 digits
stopifnot(nchar(number) == 10)
as.numeric(substr(number, start = 4, stop = 7))
}
which gives:
> splitnum(1293828893)
[1] 3828
Remove the as.numeric(....) wrapping on the last line you want the digits as a string.
Just use integer division:
> x <- 1293828893
> (x %/% 1e3) %% 1e4
[1] 3828
Here's a function that completely avoids converting the number to a character
splitnum <- function(number){
#check number is 10 digits
if(trunc(log10(X))!=9) {
stop("number not of right size")
}
(number %/% 1e3) %% 1e4
}
splitnum(1293828893)
# [1] 3828

how to return number of decimal places in R

I am working in R. I have a series of coordinates in decimal degrees, and I would like to sort these coordinates by how many decimal places these numbers have (i.e. I will want to discard coordinates that have too few decimal places).
Is there a function in R that can return the number of decimal places a number has, that I would be able to incorporate into function writing?
Example of input:
AniSom4 -17.23300000 -65.81700
AniSom5 -18.15000000 -63.86700
AniSom6 1.42444444 -75.86972
AniSom7 2.41700000 -76.81700
AniLac9 8.6000000 -71.15000
AniLac5 -0.4000000 -78.00000
I would ideally write a script that would discard AniLac9 and AniLac 5 because those coordinates were not recorded with enough precision. I would like to discard coordinates for which both the longitude and the latitude have fewer than 3 non-zero decimal values.
You could write a small function for the task with ease, e.g.:
decimalplaces <- function(x) {
if ((x %% 1) != 0) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed=TRUE)[[1]][[2]])
} else {
return(0)
}
}
And run:
> decimalplaces(23.43234525)
[1] 8
> decimalplaces(334.3410000000000000)
[1] 3
> decimalplaces(2.000)
[1] 0
Update (Apr 3, 2018) to address #owen88's report on error due to rounding double precision floating point numbers -- replacing the x %% 1 check:
decimalplaces <- function(x) {
if (abs(x - round(x)) > .Machine$double.eps^0.5) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed = TRUE)[[1]][[2]])
} else {
return(0)
}
}
Here is one way. It checks the first 20 places after the decimal point, but you can adjust the number 20 if you have something else in mind.
x <- pi
match(TRUE, round(x, 1:20) == x)
Here is another way.
nchar(strsplit(as.character(x), "\\.")[[1]][2])
Rollowing up on Roman's suggestion:
num.decimals <- function(x) {
stopifnot(class(x)=="numeric")
x <- sub("0+$","",x)
x <- sub("^.+[.]","",x)
nchar(x)
}
x <- "5.2300000"
num.decimals(x)
If your data isn't guaranteed to be of the proper form, you should do more checking to ensure other characters aren't sneaking in.
Not sure why this simple approach was not used above (load the pipe from tidyverse/magrittr).
count_decimals = function(x) {
#length zero input
if (length(x) == 0) return(numeric())
#count decimals
x_nchr = x %>% abs() %>% as.character() %>% nchar() %>% as.numeric()
x_int = floor(x) %>% abs() %>% nchar()
x_nchr = x_nchr - 1 - x_int
x_nchr[x_nchr < 0] = 0
x_nchr
}
> #tests
> c(1, 1.1, 1.12, 1.123, 1.1234, 1.1, 1.10, 1.100, 1.1000) %>% count_decimals()
[1] 0 1 2 3 4 1 1 1 1
> c(1.1, 12.1, 123.1, 1234.1, 1234.12, 1234.123, 1234.1234) %>% count_decimals()
[1] 1 1 1 1 2 3 4
> seq(0, 1000, by = 100) %>% count_decimals()
[1] 0 0 0 0 0 0 0 0 0 0 0
> c(100.1234, -100.1234) %>% count_decimals()
[1] 4 4
> c() %>% count_decimals()
numeric(0)
So R does not seem internally to distinguish between getting 1.000 and 1 initially. So if one has a vector input of various decimal numbers, one can see how many digits it initially had (at least) by taking the max value of the number of decimals.
Edited: fixed bugs
If someone here needs a vectorized version of the function provided by Gergely Daróczi above:
decimalplaces <- function(x) {
ifelse(abs(x - round(x)) > .Machine$double.eps^0.5,
nchar(sub('^\\d+\\.', '', sub('0+$', '', as.character(x)))),
0)
}
decimalplaces(c(234.1, 3.7500, 1.345, 3e-15))
#> 1 2 3 0
I have tested some solutions and I found this one robust to the bugs reported in the others.
countDecimalPlaces <- function(x) {
if ((x %% 1) != 0) {
strs <- strsplit(as.character(format(x, scientific = F)), "\\.")
n <- nchar(strs[[1]][2])
} else {
n <- 0
}
return(n)
}
# example to prove the function with some values
xs <- c(1000.0, 100.0, 10.0, 1.0, 0, 0.1, 0.01, 0.001, 0.0001)
sapply(xs, FUN = countDecimalPlaces)
In [R] there is no difference between 2.30000 and 2.3, both get rounded to 2.3 so the one is not more precise than the other if that is what you want to check. On the other hand if that is not what you meant: If you really want to do this you can use 1) multiply by 10, 2) use floor() function 3) divide by 10 4) check for equality with the original. (However be aware that comparing floats for equality is bad practice, make sure this is really what you want)
For the common application, here's modification of daroczig's code to handle vectors:
decimalplaces <- function(x) {
y = x[!is.na(x)]
if (length(y) == 0) {
return(0)
}
if (any((y %% 1) != 0)) {
info = strsplit(sub('0+$', '', as.character(y)), ".", fixed=TRUE)
info = info[sapply(info, FUN=length) == 2]
dec = nchar(unlist(info))[seq(2, length(info), 2)]
return(max(dec, na.rm=T))
} else {
return(0)
}
}
In general, there can be issues with how a floating point number is stored as binary. Try this:
> sprintf("%1.128f", 0.00000000001)
[1] "0.00000000000999999999999999939458150688409432405023835599422454833984375000000000000000000000000000000000000000000000000000000000"
How many decimals do we now have?
Interesting question. Here is another tweak on the above respondents' work, vectorized, and extended to handle the digits on the left of the decimal point. Tested against negative digits, which would give an incorrect result for the previous strsplit() approach.
If it's desired to only count the ones on the right, the trailingonly argument can be set to TRUE.
nd1 <- function(xx,places=15,trailingonly=F) {
xx<-abs(xx);
if(length(xx)>1) {
fn<-sys.function();
return(sapply(xx,fn,places=places,trailingonly=trailingonly))};
if(xx %in% 0:9) return(!trailingonly+0);
mtch0<-round(xx,nds <- 0:places);
out <- nds[match(TRUE,mtch0==xx)];
if(trailingonly) return(out);
mtch1 <- floor(xx*10^-nds);
out + nds[match(TRUE,mtch1==0)]
}
Here is the strsplit() version.
nd2 <- function(xx,trailingonly=F,...) if(length(xx)>1) {
fn<-sys.function();
return(sapply(xx,fn,trailingonly=trailingonly))
} else {
sum(c(nchar(strsplit(as.character(abs(xx)),'\\.')[[1]][ifelse(trailingonly, 2, T)]),0),na.rm=T);
}
The string version cuts off at 15 digits (actually, not sure why the other one's places argument is off by one... the reason it's exceeded through is that it counts digits in both directions so it could go up to twice the size if the number is sufficiently large). There is probably some formatting option to as.character() that can give nd2() an equivalent option to the places argument of nd1().
nd1(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0));
# 2 2 1 3 1 4 16 17 1
nd2(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0));
# 2 2 1 3 1 4 15 15 1
nd1() is faster.
rowSums(replicate(10,system.time(replicate(100,nd1(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0))))));
rowSums(replicate(10,system.time(replicate(100,nd2(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0))))));
Don't mean to hijack the thread, just posting it here as it might help someone to deal with the task I tried to accomplish with the proposed code.
Unfortunately, even the updated #daroczig's solution didn't work for me to check if a number has less than 8 decimal digits.
#daroczig's code:
decimalplaces <- function(x) {
if (abs(x - round(x)) > .Machine$double.eps^0.5) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed = TRUE)[[1]][[2]])
} else {
return(0)
}
}
In my case produced the following results
NUMBER / NUMBER OF DECIMAL DIGITS AS PRODUCED BY THE CODE ABOVE
[1] "0.0000437 7"
[1] "0.000195 6"
[1] "0.00025 20"
[1] "0.000193 6"
[1] "0.000115 6"
[1] "0.00012501 8"
[1] "0.00012701 20"
etc.
So far was able to accomplish the required tests with the following clumsy code:
if (abs(x*10^8 - floor(as.numeric(as.character(x*10^8)))) > .Machine$double.eps*10^8)
{
print("The number has more than 8 decimal digits")
}
PS: I might be missing something in regard to not taking the root of the .Machine$double.eps so please take caution
Another contribution, keeping fully as numeric representations without converting to character:
countdecimals <- function(x)
{
n <- 0
while (!isTRUE(all.equal(floor(x),x)) & n <= 1e6) { x <- x*10; n <- n+1 }
return (n)
}
Vector solution based on daroczig's function (can also deal with dirty columns containing strings and numerics):
decimalplaces_vec <- function(x) {
vector <- c()
for (i in 1:length(x)){
if(!is.na(as.numeric(x[i]))){
if ((as.numeric(x[i]) %% 1) != 0) {
vector <- c(vector, nchar(strsplit(sub('0+$', '', as.character(x[i])), ".", fixed=TRUE)[[1]][[2]]))
}else{
vector <- c(vector, 0)
}
}else{
vector <- c(vector, NA)
}
}
return(max(vector))
}
as.character uses scientific notation for numbers that are between -1e-4 and 1e-4 but not zero:
> as.character(0.0001)
[1] "1e-04"
You can use format(scientific=F) instead:
> format(0.0001,scientific=F)
[1] "0.0001"
Then do this:
nchar(sub("^-?\\d*\\.?","",format(x,scientific=F)))
Or in vectorized form:
> nplaces=function(x)sapply(x,function(y)nchar(sub("^-?\\d*\\.?","",format(y,scientific=F))))
> nplaces(c(0,-1,1.1,0.123,1e-8,-1e-8))
[1] 0 0 1 3 8 8

Resources