What's the best way to format a numeric so that it does NOT show leading zero. For example:
test = .006
sprintf/format/formatC( ??? ) # should result in ".006"
I believe I answered this once before but can't find it. You cannot tell sprintf() et al about a format that drops the leading zero ... so you have to do it yourself, eg via substring():
R> val <- 0.006
R> aa <- substring(sprintf("%4.3f", val), 2)
R> aa
[1] ".006"
R>
f <- function(x) gsub("^(\\s*[+|-]?)0\\.", "\\1.", as.character(x))
f(0.006)
# ".006"
f(-0.006)
# "-.006"
f("+0.006")
# "+.006"
f(" 0.006")
# " .006"
f(10.05)
# "10.05"
You can always fix it up yourself with regular expression search-and-replace:
library(stringr)
test = .006
str_replace(as.character(test), "^0\\.", ".")
Not the most elegant answer, but it works. Substitute whatever string conversion you like for as.character, such as sprintf with your preferred floating point format.
Related
I have a dataset with different values in R. Some values are like 11.474 and others like 1.034.496 in the same column. I would like to change the values with two dots from 1.034.496 to 1034.496. Is there anyone who could help me please?
Thanks for the help!
Use gsub with Perl regexes:
df <- data.frame(a = c('11.474', '1.034.496', '1.234.034.496'))
df$a = gsub('[.](?=.*[.])', '', df$a, perl = TRUE)
print(df)
## a
## 1 11.474
## 2 1034.496
## 3 1234034.496
Here, [.](?=.*[.]) is a literal dot (has to be escaped like so \. or put into a character class like so: [.]), followed by a literal dot using positive lookahead: (?=PATTERN).
I guess there must be other smarter regex approaches than the below one, but here is my attempt
> ifelse(lengths(gregexpr("\\.",v))>1,sub("\\.","",v),v)
[1] "11.474" "1034.496"
where
v <- c("11.474","1.034.496")
Could someone help me split this string:
string <- "Rolling in the deep $15.25"
I'm trying to get two outputs out of this:
1) Rolling in the Deep # character
2) 15.25 # numeric value
I know how to do this in excel but a bit lost with R
Using strsplit will do the trick. The solution will be as:
string <- "Rolling in the deep $15.25"
strsplit(string, "\\s+\\$")
^ ^___ find a $ (escaped with \\ because $ means end of word)
\______ find 1 or more whitespaces
# Result
#"Rolling in the deep" "15.25"
strsplit(string, "\\s+\\$")[[1]][1]
#[1] "Rolling in the deep"
strsplit(string, "\\s+\\$")[[1]][2]
#[1] "15.25"
As long as the right hand side is always preceded by a dollar sign, you will need to "escape" the dollar sign. Try this:
# you will need stringr, which you could load alone but the tidyverse is amazing
library(tidyverse)
string <- "Rolling in the deep $15.25"
str_split_fixed(string, "\\$", n = 2)
Here's how you can extract the information using only regular expressions:
x <- c("Rolling in the deep $15.25",
"Apetite for destruction $20.00",
"Piece of mind $19")
rgx <- "^(.*)\\s{2,}(\\$.*)$"
data.frame(album = trimws(gsub(rgx, "\\1", x)),
price = trimws(gsub(rgx, "\\2", x))
)
album price
1 Rolling in the deep $15.25
2 Apetite for destruction $20.00
3 Piece of mind $19
I have a list of strings, an example is shown below (the actual list has a much bigger variety in format)
[1] "AB-123"
[2] "AB-312"
[3] "AB-546"
[4] "ZXC/123456"
Assuming [1] is the correct format, I want to extract the regular expression from [1] and match it against the rest to detect that [4] is inconsistent. Is there a method to do this or is there a better way to achieve the same outcome?
*EDIT - I found something close to what I require, anyone know of any packages that does this?
Given a string, generate a regex that can parse *similar* strings
We may need grep
grepl(sub("-.*", "", v1[1]), v1[-1])
data
v1 <- c( "AB-123" , "AB-312" , "AB-546" , "ZXC/123456")
Here's an attempt at making a function which checks if each value is a Character Digit or Other. It is a bit rough but I'm sure this can be expanded upon to match exactly what you want:
test <- c("AB-123", "AB-312", "AB-546", "ZXC/123456")
compare_1st <- function(x) {
x <- toupper(x)
chars <- list("A",1,"-")
repl <- c("[A-Z]", "[0-9]", "[^0-9A-Z]")
for(i in seq_along(repl)) x <- gsub(repl[i], chars[i], x)
out <- x[1] == x
attr(out, "values") <- chartr("A1-", "CDO", x)
out
}
compare_1st(test)
#[1] TRUE TRUE TRUE FALSE
#attr(,"values")
#[1] "CCODDD" "CCODDD" "CCODDD" "CCCODDDDDD"
I have a large data frame that is filled with characters such as:
x <- c("Y188","Y204" ,"Y221","EP121_1" ,"Y233" , "Y248" ,"Y268", "BB2","BB20",
"BB32" ,"BB044" ,"BB056" , "Y234" , "Y249" ,"Y271" ,"BB3", "BB21", "BB33",
"BB045","BB057" ,"Y236", "Y250", "Y272" , "BB4", "BB22" )
As you can see, certain tags such as BB20 only have two integers. I would like the entire list of characters to have at least 3 integers like this(the issue is only in the BB tags if that helps):
Y188, Y204, Y221, EP121_1, Y233, Y248, Y268, BB002, BB020, BB032, BB044,
BB056, Y234, Y249, Y271, BB003, BB021, BB033, BB045, BB057, Y236, Y250,
Y272, BB004, BB022
Ive looked into the sprintf and FormatC functions but still am having no luck.
A forceful approach with a nested gsub call:
gsub("(.*[A-Z])(\\d{1}$)", "\\100\\2",
gsub("(.*[A-Z])(\\d{2}$)", "\\10\\2", x))
# [1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020"
# [10] "BB032" "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033"
# [19] "BB045" "BB057" "Y236" "Y250" "Y272" "BB004" "BB022"
There is surely a more general way to do this, but for such a localized task, two simple sub can be enough: add one trailing zero for two-digit numbers, two trailing zeros for one-digit numbers.
x <- sub("^BB(\\d{1})$","BB00\\1",x)
x <- sub("^BB(\\d{2})$","BB0\\1",x)
This works, but will have edge case
# indicator for numeric of length less than three
num <- gsub("[^0-9]", "", x)
id <- nchar(num) < 3
# overwrite relevant values with the reformatted ones
x[id] <- paste0(gsub("[0-9]", "", x)[id],
formatC(as.numeric(num[id]), width = 3, flag = "0"))
[1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020" "BB032"
[11] "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033" "BB045" "BB057"
[21] "Y236" "Y250" "Y272" "BB004" "BB022"
It can be done using sprintf and gsub function.This step would extract numeric values and change its format.
num=sprintf("%03d",as.numeric(gsub("[^[:digit:]]", "", x)))
Next step would be to paste back numbers with changed format
x=paste(gsub("[^[:alpha:]]", "", x),num,sep="")
I have vector like this:
x <- c("20(0.23)", "15(0.2)", "16(0.09)")
and I don't want to mess with the numbers on the outside of the parenthesis but want to remove the leading zero on the numbers inside and make everything have 2 digits. The output will look like:
"20(.23)", "15(.20)", "16(.09)"
Useful information:
I can remove leading zero and retain 2 digits using the function below taken from: LINK
numformat <- function(val) { sub("^(-?)0.", "\\1.", sprintf("%.2f", val)) }
numformat(c(0.2, 0.26))
#[1] ".20" ".26"
I know gsub can be used but I don't know how. I'll provide a strsplit answer but that's hackish at best.
The gsubfn package allows you to replace anything matched by a regex with a function applied to the match. So we could use what you have with your numformat function
library(gsubfn)
# Note that I added as.numeric in because what will be passed in
# is a character string
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.2f", as.numeric(val)))}
gsubfn("0\\.\\d+", numformat, x)
#[1] "20(.23)" "15(.20)" "16(.09)"
pad.fix<-function(x){
y<-gsub('\\.(\\d)\\)','\\.\\10\\)',x)
gsub('0\\.','\\.',y)
}
the first gsub adds a trailing zero if needed the second gsub removes the leading zero.
That is yet another of these Tyler questions that seem to be complicated just for complications sake :)
So here you go:
R> x <- c("20(0.23)", "15(0.2)", "16(0.09)")
R> sapply(strsplit(gsub("^(\\d+)\\((.*)\\)$", "\\1 \\2", x), " "),
+ function(x) sprintf("%2d(.%02d)",
+ as.numeric(x[1]),
+ as.numeric(x[2])*100))
[1] "20(.23)" "15(.20)" "16(.09)"
R>
We do a few things here:
The gsub() picks off the two two numbers: first the one before the parens, then the one inside the parens. [With hindsight, should have picked after the decimal, see below.]
This prints them out just with whitespace, e.g. "20 0.23" for the first.
We then use a standard strsplit() on this.
We then use sapply to process the list we get from strsplit
We print the first number as a two-digit int.
The second one is more tricky -- the (s)printf() family cannot suppress a leading zero so we print the decimal, and the print two digits of an integer -- and convert the second number accordingly.
It is all concise and in one line, but it would be clearer broken out.
Edit: I don;t often provide the fastest solutions, but when I do, at least I can gloat:
R> dason <- function(x) { numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.2f", as.numeric(val)))}; gsubfn("0\\.\\d+", numformat, x) }
R> dirk <- function(x) { sapply(strsplit(gsub("^(\\d+)\\((.*)\\)$", "\\1 \\2", x), " "), function(x) sprintf("%2d(.%02d)", as.numeric(x[1]), as.numeric(x[2])*100)) }
R>
R> dason(x)
[1] "20(.23)" "15(.20)" "16(.09)"
R> dirk(x)
[1] "20(.23)" "15(.20)" "16(.09)"
R>
R> res <- benchmark(dason(x), dirk(x), replications=1000, order="relative")
R> res
test replications elapsed relative user.self sys.self user.child sys.child
2 dirk(x) 1000 0.133 1.000 0.132 0.000 0 0
1 dason(x) 1000 2.026 15.233 1.960 0.064 0 0
R>
So that's about 15 rimes faster. Not that it matters in this context, but speed never hurt anyone in the long run.
Non gsub answer that's ugly at best.
x <- c("20(0.23)", "15(0.2)", "16(0.09)")
numformat <- function(val) { sub("^(-?)0.", "\\1.", sprintf("%.2f", val)) }
z <- do.call(rbind, strsplit(gsub("\\)", "", x), "\\("))
z[, 2] <- numformat(as.numeric(z[, 2]))
paste0(z[, 1], "(", z[, 2], ")")