How to format a number with specified level of precision? - r

I would like to create a function that returns a vector of numbers a precision reflected by having only n significant figures, but without trailing zeros, and not in scientific notation
e.g, I would like
somenumbers <- c(0.000001234567, 1234567.89)
myformat(x = somenumbers, n = 3)
to return
[1] 0.00000123 1230000
I have been playing with format, formatC, and sprintf, but they don't seem to want to work on each number independently, and they return the numbers as character strings (in quotes).
This is the closest that i have gotten example:
> format(signif(somenumbers,4), scientific=FALSE)
[1] " 0.000001235" "1235000.000000000"

You can use the signif function to round to a given number of significant digits. If you don't want extra trailing 0's then don't "print" the results but do something else with them.
> somenumbers <- c(0.000001234567, 1234567.89)
> options(scipen=5)
> cat(signif(somenumbers,3),'\n')
0.00000123 1230000
>

sprintf seems to do it:
sprintf(c("%1.8f", "%1.0f"), signif(somenumbers, 3))
[1] "0.00000123" "1230000"

how about
myformat <- function(x, n) {
noquote(sapply(a,function(x) format(signif(x,2), scientific=FALSE)))
}

Related

How to only show two decimal points without rounding

For a value like 0.999942062, I would like to show this value as 0.99 without rounding to 1. For that, I tried the following functions:
x <- 0.999942062
round(x, 2)
formatC(x, 2)
sprintf(x, fmt = '%#.2f')
They all result in 1. Do you know how to print it like 0.99?
You could use substr() converted into "character" beforehand.
substr(as.character(0.999942062), 1, 4)
# [1] "0.99"
Use trunc(x * 10^n) / 10^n as you want to simply truncate your number (effectively rounding it towards 0)
also floor(x * 10^n) / 10^n would archieve the same for positive numbers
Edit:
changed floor and trunc usage as I had memorized them wrongly as #Sotos and #Roland pointed out
x = c(0.999942062, 20)
gsub("(.*)(\\.)(.{2}).*", "\\1\\2\\3", x)
#[1] "0.99" "20"
if your number could have varying number of digits to the left of the decimal point, you'll want to use regexpr inside of substr after converting to character to locate the decimal point and then add two digits to the end. Like so:
substr(as.character(10.999462), 1, regexpr("\\.", as.character(10.999462)) + 2)

put left padded zeros inside string

i want to write a function which takes a character Vector(including numbers) as Input and left pads zeroes to the numbers in it. for example this could be an Input Vector :
x<- c("abc124.kk", "77kk-tt", "r5mm")
x
[1] "abc124.kk" "77kk-tt" "r5mm"
each string of the input Vector contains only one Vector but there all in different positions(some are at the end, some in the middle..)
i want the ouput to look like this:
"abc124.kk" "077kk-tt" "r005mm"
that means to put as many leading Zeros to the number included in the string so that it has as many Digits as the longest number.
but i want a function who does this for every string Input not only my example(the x Vector).
i already started extracting the numbers and letters and turned the numbers the way i want them but how can i put them back together and back on the right Position?
my_function<- function(x){
letters<- str_extract_all(x,"[a-z]+")
numbers<- str_extract_all(x, "[0-9]+")
digit_width<-max(nchar(numbers))
numbers_correct<- str_pad(numbers, width=digit_width, pad="0")
}
and what if i have a Vector which includes some strings without numbers? how can i exclude them and get them back without any changes ?
for example if teh Input would be
y<- c("12ab", "cd", "ef345")
the numbers variable Looks like that:
[[1]]
[1] "12"
[[2]]
character(0)
in this case i would want that the ouput at the would look like this:
"012ab" "cd" "ef345"
An option would be using gsubfn to capture the digits, convert it to numeric and then pass it to sprintf for formatting
library(gsubfn)
gsubfn("([0-9]+)", ~ sprintf("%03d", as.numeric(x)), x)
#[1] "abc124.kk" "077kk-tt" "r005mm"
x <- c("12ab", "cd", "ef345")
s = gsub("\\D", "", x)
n = nchar(s)
max_n = max(n)
sapply(seq_along(x), function(i){
if (n[i] < max_n) {
zeroes = paste(rep(0, max_n - n[i]), collapse = "")
gsub("\\d+", paste0(zeroes, s[i]), x[i])
} else {
x[i]
}
})
#[1] "012ab" "cd" "ef345"

How to use sub in R with numeric operations on the matches?

Let's say I want to change the string X0_Y1_Z2 into X0_Y1_Z1, i.e. to decrease the last number by one. I tried it by the following statement in R, which doesn't work:
sub("(\\S+_\\S+_)\\S(\\d)", paste0("\\1", as.numeric("\\2")-1), "X0_Y1_Z2", perl=T)
How can I do it?
If you always have the string in this same format, and you only have 1 last digit to decrement, use a simple substring:
> paste0(substring(s, 1, nchar(s)-1), as.numeric(substring(s, nchar(s))) - 1)
> [1] "X0_Y1_Z1"
In order to match the last digit chunk in a string, use [0-9]+$ regex. To increase the value, use gsubfn package. See an example code:
> library(gsubfn)
> s <- "X0_Y1_Z2"
> gsubfn('[0-9]+$', ~ as.numeric(x)-1, s)
[1] "X0_Y1_Z1"
If you need to validate the string the way you did, use more groups and the anchors ^ and $ will require the whole string to match the pattern (a "full string match"):
> p <- "^(\\S+_\\S+_\\S)(\\d+)$"
> gsubfn(p, function(x1,x2) paste0(x1, as.numeric(x2)-1), s)
[1] "X0_Y1_Z1"

Generate vector of a repeated string with incremental suffix number

I would like to generate a vector based on repeating the string "FST" but with a number at the end which increments:
"Fst1" "Fst2" "Fst3" "Fst4" ... "Fst100"
An alternative to paste is sprintf, which can be a bit more convenient if, for instance, you wanted to "pad" your digits with leading zeroes.
Here's an example:
sprintf("Fst%d", 1:10) ## No padding
# [1] "Fst1" "Fst2" "Fst3" "Fst4" "Fst5"
# [6] "Fst6" "Fst7" "Fst8" "Fst9" "Fst10"
sprintf("Fst%02d", 1:10) ## Pads anything less than two digits with zero
# [1] "Fst01" "Fst02" "Fst03" "Fst04" "Fst05"
# [6] "Fst06" "Fst07" "Fst08" "Fst09" "Fst10"
So, for your question, you would be looking at:
sprintf("Fst%d", 1:100) ## or sprintf("Fst%03d", 1:100)
You can use the paste function to create a vector that combines a set character string with incremented numbers: paste0('Fst', 1:100)

Extract first X Numbers from Text Field using Regex

I have strings that looks like this.
x <- c("P2134.asfsafasfs","P0983.safdasfhdskjaf","8723.safhakjlfds")
I need to end up with:
"2134", "0983", and "8723"
Essentially, I need to extract the first four characters that are numbers from each element. Some begin with a letter (disallowing me from using a simple substring() function).
I guess technically, I could do something like:
x <- gsub("^P","",x)
x <- substr(x,1,4)
But I want to know how I would do this with regex!
You could use str_match from the stringr package:
library(stringr)
print(c(str_match(x, "\\d\\d\\d\\d")))
# [1] "2134" "0983" "8723"
You can do this with gsub too.
> sub('.?([0-9]{4}).*', '\\1', x)
[1] "2134" "0983" "8723"
>
I used sub instead of gsub to assure I only got the first match. .? says any single character and its optional (similar to just . but then it wouldn't match the case without the leading P). The () signify a group that I reference in the replacement '\\1'. If there were multiple sets of () I could reference them too with '\\2'. Inside the group, and you had the syntax correct, I want only numbers and I want exactly 4 of them. The final piece says zero or more trailing characters of any type.
Your syntax was working, but you were replacing something with itself so you wind up with the same output.
This will get you the first four digits of a string, regardless of where in the string they appear.
mapply(function(x, m) paste0(x[m], collapse=""),
strsplit(x, ""),
lapply(gregexpr("\\d", x), "[", 1:4))
Breaking it down into pieces:
What's going on in the above line is as follows:
# this will get you a list of matches of digits, and their location in each x
matches <- gregexpr("\\d", x)
# this gets you each individual digit
matches <- lapply(matches, "[", 1:4)
# individual characters of x
splits <- strsplit(x, "")
# get the appropriate string
mapply(function(x, m) paste0(x[m], collapse=""), splits, matches)
Another group capturing approach that doesn't assume 4 numbers.
x <- c("P2134.asfsafasfs","P0983.safdasfhdskjaf","8723.safhakjlfds")
gsub("(^[^0-9]*)(\\d+)([^0-9].*)", "\\2", x)
## [1] "2134" "0983" "8723"

Resources