format numeric without leading zero - r

What's the best way to format a numeric so that it does NOT show leading zero. For example:
test = .006
sprintf/format/formatC( ??? ) # should result in ".006"

I believe I answered this once before but can't find it. You cannot tell sprintf() et al about a format that drops the leading zero ... so you have to do it yourself, eg via substring():
R> val <- 0.006
R> aa <- substring(sprintf("%4.3f", val), 2)
R> aa
[1] ".006"
R>

f <- function(x) gsub("^(\\s*[+|-]?)0\\.", "\\1.", as.character(x))
f(0.006)
# ".006"
f(-0.006)
# "-.006"
f("+0.006")
# "+.006"
f(" 0.006")
# " .006"
f(10.05)
# "10.05"

You can always fix it up yourself with regular expression search-and-replace:
library(stringr)
test = .006
str_replace(as.character(test), "^0\\.", ".")
Not the most elegant answer, but it works. Substitute whatever string conversion you like for as.character, such as sprintf with your preferred floating point format.

Related

Remove first "." from values in R

I have a dataset with different values in R. Some values are like 11.474 and others like 1.034.496 in the same column. I would like to change the values with two dots from 1.034.496 to 1034.496. Is there anyone who could help me please?
Thanks for the help!
Use gsub with Perl regexes:
df <- data.frame(a = c('11.474', '1.034.496', '1.234.034.496'))
df$a = gsub('[.](?=.*[.])', '', df$a, perl = TRUE)
print(df)
## a
## 1 11.474
## 2 1034.496
## 3 1234034.496
Here, [.](?=.*[.]) is a literal dot (has to be escaped like so \. or put into a character class like so: [.]), followed by a literal dot using positive lookahead: (?=PATTERN).
I guess there must be other smarter regex approaches than the below one, but here is my attempt
> ifelse(lengths(gregexpr("\\.",v))>1,sub("\\.","",v),v)
[1] "11.474" "1034.496"
where
v <- c("11.474","1.034.496")

splitting text into character and numeric

Could someone help me split this string:
string <- "Rolling in the deep $15.25"
I'm trying to get two outputs out of this:
1) Rolling in the Deep # character
2) 15.25 # numeric value
I know how to do this in excel but a bit lost with R
Using strsplit will do the trick. The solution will be as:
string <- "Rolling in the deep $15.25"
strsplit(string, "\\s+\\$")
^ ^___ find a $ (escaped with \\ because $ means end of word)
\______ find 1 or more whitespaces
# Result
#"Rolling in the deep" "15.25"
strsplit(string, "\\s+\\$")[[1]][1]
#[1] "Rolling in the deep"
strsplit(string, "\\s+\\$")[[1]][2]
#[1] "15.25"
As long as the right hand side is always preceded by a dollar sign, you will need to "escape" the dollar sign. Try this:
# you will need stringr, which you could load alone but the tidyverse is amazing
library(tidyverse)
string <- "Rolling in the deep $15.25"
str_split_fixed(string, "\\$", n = 2)
Here's how you can extract the information using only regular expressions:
x <- c("Rolling in the deep $15.25",
"Apetite for destruction $20.00",
"Piece of mind $19")
rgx <- "^(.*)\\s{2,}(\\$.*)$"
data.frame(album = trimws(gsub(rgx, "\\1", x)),
price = trimws(gsub(rgx, "\\2", x))
)
album price
1 Rolling in the deep $15.25
2 Apetite for destruction $20.00
3 Piece of mind $19

Extracting and matching regular expressions in R

I have a list of strings, an example is shown below (the actual list has a much bigger variety in format)
[1] "AB-123"
[2] "AB-312"
[3] "AB-546"
[4] "ZXC/123456"
Assuming [1] is the correct format, I want to extract the regular expression from [1] and match it against the rest to detect that [4] is inconsistent. Is there a method to do this or is there a better way to achieve the same outcome?
*EDIT - I found something close to what I require, anyone know of any packages that does this?
Given a string, generate a regex that can parse *similar* strings
We may need grep
grepl(sub("-.*", "", v1[1]), v1[-1])
data
v1 <- c( "AB-123" , "AB-312" , "AB-546" , "ZXC/123456")
Here's an attempt at making a function which checks if each value is a Character Digit or Other. It is a bit rough but I'm sure this can be expanded upon to match exactly what you want:
test <- c("AB-123", "AB-312", "AB-546", "ZXC/123456")
compare_1st <- function(x) {
x <- toupper(x)
chars <- list("A",1,"-")
repl <- c("[A-Z]", "[0-9]", "[^0-9A-Z]")
for(i in seq_along(repl)) x <- gsub(repl[i], chars[i], x)
out <- x[1] == x
attr(out, "values") <- chartr("A1-", "CDO", x)
out
}
compare_1st(test)
#[1] TRUE TRUE TRUE FALSE
#attr(,"values")
#[1] "CCODDD" "CCODDD" "CCODDD" "CCCODDDDDD"

Adding leading 0s in r

I have a large data frame that is filled with characters such as:
x <- c("Y188","Y204" ,"Y221","EP121_1" ,"Y233" , "Y248" ,"Y268", "BB2","BB20",
"BB32" ,"BB044" ,"BB056" , "Y234" , "Y249" ,"Y271" ,"BB3", "BB21", "BB33",
"BB045","BB057" ,"Y236", "Y250", "Y272" , "BB4", "BB22" )
As you can see, certain tags such as BB20 only have two integers. I would like the entire list of characters to have at least 3 integers like this(the issue is only in the BB tags if that helps):
Y188, Y204, Y221, EP121_1, Y233, Y248, Y268, BB002, BB020, BB032, BB044,
BB056, Y234, Y249, Y271, BB003, BB021, BB033, BB045, BB057, Y236, Y250,
Y272, BB004, BB022
Ive looked into the sprintf and FormatC functions but still am having no luck.
A forceful approach with a nested gsub call:
gsub("(.*[A-Z])(\\d{1}$)", "\\100\\2",
gsub("(.*[A-Z])(\\d{2}$)", "\\10\\2", x))
# [1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020"
# [10] "BB032" "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033"
# [19] "BB045" "BB057" "Y236" "Y250" "Y272" "BB004" "BB022"
There is surely a more general way to do this, but for such a localized task, two simple sub can be enough: add one trailing zero for two-digit numbers, two trailing zeros for one-digit numbers.
x <- sub("^BB(\\d{1})$","BB00\\1",x)
x <- sub("^BB(\\d{2})$","BB0\\1",x)
This works, but will have edge case
# indicator for numeric of length less than three
num <- gsub("[^0-9]", "", x)
id <- nchar(num) < 3
# overwrite relevant values with the reformatted ones
x[id] <- paste0(gsub("[0-9]", "", x)[id],
formatC(as.numeric(num[id]), width = 3, flag = "0"))
[1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020" "BB032"
[11] "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033" "BB045" "BB057"
[21] "Y236" "Y250" "Y272" "BB004" "BB022"
It can be done using sprintf and gsub function.This step would extract numeric values and change its format.
num=sprintf("%03d",as.numeric(gsub("[^[:digit:]]", "", x)))
Next step would be to paste back numbers with changed format
x=paste(gsub("[^[:alpha:]]", "", x),num,sep="")

gsub and pad inside of a parenthesis

I have vector like this:
x <- c("20(0.23)", "15(0.2)", "16(0.09)")
and I don't want to mess with the numbers on the outside of the parenthesis but want to remove the leading zero on the numbers inside and make everything have 2 digits. The output will look like:
"20(.23)", "15(.20)", "16(.09)"
Useful information:
I can remove leading zero and retain 2 digits using the function below taken from: LINK
numformat <- function(val) { sub("^(-?)0.", "\\1.", sprintf("%.2f", val)) }
numformat(c(0.2, 0.26))
#[1] ".20" ".26"
I know gsub can be used but I don't know how. I'll provide a strsplit answer but that's hackish at best.
The gsubfn package allows you to replace anything matched by a regex with a function applied to the match. So we could use what you have with your numformat function
library(gsubfn)
# Note that I added as.numeric in because what will be passed in
# is a character string
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.2f", as.numeric(val)))}
gsubfn("0\\.\\d+", numformat, x)
#[1] "20(.23)" "15(.20)" "16(.09)"
pad.fix<-function(x){
y<-gsub('\\.(\\d)\\)','\\.\\10\\)',x)
gsub('0\\.','\\.',y)
}
the first gsub adds a trailing zero if needed the second gsub removes the leading zero.
That is yet another of these Tyler questions that seem to be complicated just for complications sake :)
So here you go:
R> x <- c("20(0.23)", "15(0.2)", "16(0.09)")
R> sapply(strsplit(gsub("^(\\d+)\\((.*)\\)$", "\\1 \\2", x), " "),
+ function(x) sprintf("%2d(.%02d)",
+ as.numeric(x[1]),
+ as.numeric(x[2])*100))
[1] "20(.23)" "15(.20)" "16(.09)"
R>
We do a few things here:
The gsub() picks off the two two numbers: first the one before the parens, then the one inside the parens. [With hindsight, should have picked after the decimal, see below.]
This prints them out just with whitespace, e.g. "20 0.23" for the first.
We then use a standard strsplit() on this.
We then use sapply to process the list we get from strsplit
We print the first number as a two-digit int.
The second one is more tricky -- the (s)printf() family cannot suppress a leading zero so we print the decimal, and the print two digits of an integer -- and convert the second number accordingly.
It is all concise and in one line, but it would be clearer broken out.
Edit: I don;t often provide the fastest solutions, but when I do, at least I can gloat:
R> dason <- function(x) { numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.2f", as.numeric(val)))}; gsubfn("0\\.\\d+", numformat, x) }
R> dirk <- function(x) { sapply(strsplit(gsub("^(\\d+)\\((.*)\\)$", "\\1 \\2", x), " "), function(x) sprintf("%2d(.%02d)", as.numeric(x[1]), as.numeric(x[2])*100)) }
R>
R> dason(x)
[1] "20(.23)" "15(.20)" "16(.09)"
R> dirk(x)
[1] "20(.23)" "15(.20)" "16(.09)"
R>
R> res <- benchmark(dason(x), dirk(x), replications=1000, order="relative")
R> res
test replications elapsed relative user.self sys.self user.child sys.child
2 dirk(x) 1000 0.133 1.000 0.132 0.000 0 0
1 dason(x) 1000 2.026 15.233 1.960 0.064 0 0
R>
So that's about 15 rimes faster. Not that it matters in this context, but speed never hurt anyone in the long run.
Non gsub answer that's ugly at best.
x <- c("20(0.23)", "15(0.2)", "16(0.09)")
numformat <- function(val) { sub("^(-?)0.", "\\1.", sprintf("%.2f", val)) }
z <- do.call(rbind, strsplit(gsub("\\)", "", x), "\\("))
z[, 2] <- numformat(as.numeric(z[, 2]))
paste0(z[, 1], "(", z[, 2], ")")

Resources