Avoiding rounding with formatC - r

I am using formatC to ensure that a bunch of numbers are all printed to the same length. Some numbers are shorter than the desired length and padded with 0s, and some are longer and truncated. The issue is that formatC rounds in the last digit.
This is fine
> formatC(1, digits = 5, format = 'f')
[1] "1.00000"
I do not like the rounding, I would rather truncate it at the nth digit without rounding.
> formatC(1.234567, digits = 5, format = 'f')
[1] "1.23457"
Is there a way to truncate numbers without rounding in R? I understand that it could be possible to first convert to character and then grab a certain substring of that, but that feels clunky.

It's a little hacky, but you can use trunc with a little multiplication:
trunc(1.234567 * 1e5) / 1e5
# [1] 1.23456
Functionalize it:
trunc2 = function(x, d) trunc(x * 10 ^ d) / 10 ^ d
Then you can
formatC(trunc2(1.234567, 5), digits = 5, format = 'f')
# [1] "1.23456"

Related

Weighted sum of digits in R

I am trying to figure out the most efficient way to calculate the weighted sum of digits for a numeric string (where the weight is equal to the position of the digit in the numeric string).
Example: For the number 1059, the weighted sum of digits is calculated as 1 * 1 + 0 * 2 + 5 * 3 + 9 * 4 = 52
I would like to allow for the input to be of any length, but if there are more efficient ways when there is a limit to the string length (e.g. knowing that the number is no more 10 digits allows for a more efficient program) I am open to that too. Also, if it is preferred that the input is a of type numeric rather than character that is acceptable too.
What I have right now is an old fashioned for loop:
wsod <- function(str) {
output <- 0
for (pos in 1:nchar(str)) {
digit <- as.numeric(substr(str, pos , pos))
output <- output + pos * digit
}
output
}
A few solutions have been proposed for Python (using a numeric input) but I don't think they apply to R directly.
> number <- 1059
> x <- strsplit(as.character(number), "")[[1]]
> y <- seq_len(nchar(number))
> as.numeric(as.numeric(x) %*% y)
[1] 52
weighted.digit <- function(str) {
splitted.nums <- as.numeric(strsplit(str, '')[[1]])
return(sum(splitted.nums * 1:length(splitted.nums)))
}
weighted.digit('1059')
[1] 52
One could modify this to accept a numeric input, and then simply convert that to character as a first step.

How to round a number and make it show zeros?

The common code in R for rounding a number to say 2 decimal points is:
> a = 14.1234
> round(a, digits=2)
> a
> 14.12
However if the number has zeros as the first two decimal digits, R suppresses zeros in display:
> a = 14.0034
> round(a, digits=2)
> a
> 14
How can we make R to show first decimal digits even when they are zeros? I especially need this in plots. I've searched here and some people have suggested using options(digits=2), but this makes R to have a weird behavior.
We can use format
format(round(a), nsmall = 2)
#[1] "14.00"
As #arvi1000 mentioned in the comments, we may need to specify the digits in round
format(round(a, digits=2), nsmall = 2)
data
a <- 14.0034
Try this:
a = 14.0034
sprintf('%.2f',a) # 2 digits after decimal
# [1] "14.00"
The formatC function works nicely if you apply it to the vector after rounding. Here the inner function round rounds to two decimal places then the outer function formatC formats to the same number of decimal places as the number were rounded to. This essentially re-adds zeros to the number that would otherwise end without the decimal places (e.g., 14.0034 is rounded to 14, which becomes 14.00).
a=c(14.0034, 14.0056)
formatC(round(a,2),2,format="f")
#[1] "14.00", "14.01"
You can use this function instead of round and just use it like you use round function.
import decimal
def printf(x, n):
d = decimal.Decimal(str(x))
d0 = -(d.as_tuple().exponent)
if d0 < n:
print("x = ", x)
else:
d1 = decimal.Decimal(str(round(x, n)))
d2 = d1.as_tuple().exponent
MAX = n + d2
if MAX == 0:
print("x = ", round(x, n))
else:
i = 0
print("x = ", round(x, n), end = '')
while i != MAX:
if i == (MAX - 1):
print("0")
else:
print("0", end = '')
i = i + 1
So you must have something like this.
>>> printf(0.500000000000001, 13)
>>> 0.5000000000000

Fast way to get numeric precision and scale (n/o decimal points) for a numeric vector

I have a vector with many numbers (> 1E9 elements) and want to derive
the numeric precision (number of digits in a number) and numeric scale (the number of digits to the right of the decimal point in a number).
How can I do this very fast (vectorized)?
There exists a question with a partial answer (how to return number of decimal places in R) but the solution neither fast (vectorized) nor calculates the numeric precision.
Example:
# small example vector with numeric data
x <- c(7654321, 54321.1234, 321.123, 321.123456789)
> numeric.precision(x) # implementation is the answer
[1] 7, 9, 6, 12
> numeric.scale(x) # implementation is the answer
[1] 0, 4, 3, 9
Optional "sugar" (added later to this question - thx to #thc and #gregor):
How can I avoid over-counting the number of digits due to internal imprecision how numbers are stored in computers (e. g. float)?
> x = 54321.1234
> as.character(x)
[1] "54321.1234"
> print(x, digits = 22)
[1] 54321.12339999999676365
Here is a base R method to start with It is bound to be too slow, but at least calculates the desired results.
# precision
nchar(sub(".", "", x, fixed=TRUE))
[1] 7 9 6 12
# scale
nchar(sub("\\d+\\.?(.*)$", "\\1", x))
[1] 0 4 3 9
For this method, I'd recommend using the colClasses argument in with data.table's fread to avoid conversion to numeric precision issues in the first place:
x <- unlist(fread("7654321
54321.1234
321.123
321.123456789", colClasses="character"), use.names=FALSE)
It may be necessary to convert the vector to numeric during the input, as mentioned in the comments, for example some of the input values are in scientific notation in the text file. In this instance, using a formatting statement or options(scipen=999) to force the conversion from this format to standard decimal format may be necessary as noted in this answer.
Here is idea of math version (faster then manipulate with characters). You can put this in functions scale and precision, where in function precision call scale function.
for (i in 1:length(x)) {
after <- 0
while(x[i]*(10^after) != round(x[i]*(10^after)))
{ after <- after + 1 }
cat(sprintf("Scale: %s\n", after))
before <- floor(log10(abs(x[i])))+1
cat(sprintf("Precision: %s\n", before+after))
}
Result:
Scale: 0
Precision: 7
Scale: 4
Precision: 9
Scale: 3
Precision: 6
Scale: 9
Precision: 12
Just to consolidate all comments and answers into one ready-to-use solution that also considers different countries (locales) and NA I post this as an answer (please give credits to #Imo, #Gregor et al.).
Edit (Feb 09, 2017): Added the SQL.precision as return value since it may be different from the mathematical precision.
#' Calculates the biggest precision and scale that occurs in a numeric vector
#'
#' The scale of a numeric is the count of decimal digits in the fractional part (to the right of the decimal point).
#' The precision of a numeric is the total count of significant digits in the whole number,
#' that is, the number of digits to both sides of the decimal point.
#'
#' To create a suitable numeric data type in a SQL data base use the returned \code{SQL.precision} which
#' is defined by \code{max(precision, non.fractional.precision + scale)}.
#'
#' #param x numeric vector
#'
#' #return A list with four elements:
#' precision (total number of significant digits in the whole number),
#' scale (number of digits in the fractional part),
#' non.fractional.precision (number of digits at the left side and SQL precision.
#'
#' #details NA will be counted as precision 1 and scale 0!
#'
#' #examples
#'
#' \preformatted{
#' x <- c(0, 7654321, 54321.1234, 321.123, 321.123456789, 54321.1234, 100000000000, 1E4, NA)
#' numeric.precision.and.scale(x)
#' numeric.precision.and.scale(c(10.0, 1.2)) # shows why the SQL.precision is different
#' }
numeric.precision.and.scale <- function(x) {
# Remember current options
old.scipen <- getOption("scipen")
# Overwrite options
options(scipen = 999) # avoid scientific notation when converting numerics to strings
# Extract the decimal point character of the computer's current locale
decimal.sign <- substr( 1 / 2, 2, 2)
x.string <- as.character(x[!is.na(x)])
if (length(x.string) > 0) {
# calculate
precision <- max(nchar(sub(decimal.sign, "", x.string, fixed = TRUE)))
scale <- max(nchar(sub(paste0("\\d+\\", decimal.sign, "?(.*)$"), "\\1", x.string)))
non.fractional.precision <- max(trunc(log10(abs(x))) + 1, na.rm = TRUE)
SQL.precision <- max(precision, non.fractional.precision + scale)
# Reset changed options
options(scipen = old.scipen)
} else {
precision <- 1
scale <- 0
non.fractional.precision <- 1
SQL.precision <- 1
}
return(list(precision = precision,
scale = scale,
non.fractional.precision = non.fractional.precision,
SQL.precision = SQL.precision))
}

R Programming - convert numeric to MM:SS

This may be a silly question, but I can't find anything on this.
I have a numeric value that represents seconds. How can I convert it to MM:SS
For example
My number is 96
Represented in MM:SS it should be 01:36.
Any help is appreciated.
The %/% (integer division) and %% (modulo) operators are your friends:
x <- 96
paste(x %/% 60, x %% 60, sep = ":")
which gives
> paste(x %/% 60, x %% 60, sep = ":")
[1] "1:36"
Here it is in a function:
d2ms <- function(x) {
paste(x %/% 60, x %% 60, sep = ":")
}
> xx <- c(120, 96, 45, 30)
> d2ms(xx)
[1] "2:0" "1:36" "0:45" "0:30"
Which shows we need a little help to get exactly the format you need; see ?sprint for ways to format numbers [as characters] with leading 0s etc:
d2ms <- function(x) {
sprintf("%02d:%02d", x %/%60, x %% 60)
}
> d2ms(xx)
[1] "02:00" "01:36" "00:45" "00:30"
Note that the : in the string above is a literal, the %xxy bits are the formats for the values specified in the next two arguments and include formatting details for the number of zeros to pad (i.e. pad with zeroes until number uses two digits.) The template for this usage here is:
%[flag][width]specifier,
where here we used:
0 as the flag --- pad with 0s
width was 2, we want MM or SS
specifier was d for integers (could also have been i)
Whether you need that or not is up to your end use case.
These operators are quite useful for these sorts of operations; another example would be converting from degrees, minutes, seconds notation to decimal degrees for spatial coordinates.
Try:
x<-96
sprintf("%02d:%02d",x%/%60,x%%60)
#[1] "01:36"

Splitting a number in R

In R I have a number, say 1293828893, called x.
I wish to split this number so as to remove the middle 4 digits 3828 and return them, pseudocode is as follows:
splitnum <- function(number){
#check number is 10 digits
if(nchar(number) != 10){
stop("number not of right size");
}
middlebits <- middle 4 digits of number
return(middlebits);
}
This is a pretty simple question but the only solutions I have found apply to character strings, rather than numeric ones.
If of interest, I am trying to create an implementation in R of the Middle-square method, but this step is particularly tricky.
You can use substr(). See its help page ?substr. In your function I would do:
splitnum <- function(number){
#check number is 10 digits
stopifnot(nchar(number) == 10)
as.numeric(substr(number, start = 4, stop = 7))
}
which gives:
> splitnum(1293828893)
[1] 3828
Remove the as.numeric(....) wrapping on the last line you want the digits as a string.
Just use integer division:
> x <- 1293828893
> (x %/% 1e3) %% 1e4
[1] 3828
Here's a function that completely avoids converting the number to a character
splitnum <- function(number){
#check number is 10 digits
if(trunc(log10(X))!=9) {
stop("number not of right size")
}
(number %/% 1e3) %% 1e4
}
splitnum(1293828893)
# [1] 3828

Resources