How to convert character argument (decimal number) to numeric in R? - r

I want to convert for example the number 167009345.8 to 167009345.8.
I have used lots of ways but I have problems.
For example,
x <- "167009345.8"
class(x) <- "numeric"`
the output is 167009346.
But I want the decimal number 167009345.8.
I have used also as.numeric, but I have the same problem.
Could you please help me?

options(digits=10)
x<-"167009345.8"
as.numeric(x)
[1] 167009345.8

I had to split a long character string into separate decimal numbers. It was very frustrating. Maybe this can spare some others some time:
string = as.character("1.23456, -2.34567, -8.90, +0, +99999.9999, -0.0")
charlist = strsplit(string, "," )
numberlist = lapply(charlist, function(x) (as.numeric(x)))
vector = as.numeric(unlist(numberlist))
vector
[1] 1.23456 -2.34567 -8.90000 0.00000 99999.99990 0.00000

Related

Adding leading 0s in r

I have a large data frame that is filled with characters such as:
x <- c("Y188","Y204" ,"Y221","EP121_1" ,"Y233" , "Y248" ,"Y268", "BB2","BB20",
"BB32" ,"BB044" ,"BB056" , "Y234" , "Y249" ,"Y271" ,"BB3", "BB21", "BB33",
"BB045","BB057" ,"Y236", "Y250", "Y272" , "BB4", "BB22" )
As you can see, certain tags such as BB20 only have two integers. I would like the entire list of characters to have at least 3 integers like this(the issue is only in the BB tags if that helps):
Y188, Y204, Y221, EP121_1, Y233, Y248, Y268, BB002, BB020, BB032, BB044,
BB056, Y234, Y249, Y271, BB003, BB021, BB033, BB045, BB057, Y236, Y250,
Y272, BB004, BB022
Ive looked into the sprintf and FormatC functions but still am having no luck.
A forceful approach with a nested gsub call:
gsub("(.*[A-Z])(\\d{1}$)", "\\100\\2",
gsub("(.*[A-Z])(\\d{2}$)", "\\10\\2", x))
# [1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020"
# [10] "BB032" "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033"
# [19] "BB045" "BB057" "Y236" "Y250" "Y272" "BB004" "BB022"
There is surely a more general way to do this, but for such a localized task, two simple sub can be enough: add one trailing zero for two-digit numbers, two trailing zeros for one-digit numbers.
x <- sub("^BB(\\d{1})$","BB00\\1",x)
x <- sub("^BB(\\d{2})$","BB0\\1",x)
This works, but will have edge case
# indicator for numeric of length less than three
num <- gsub("[^0-9]", "", x)
id <- nchar(num) < 3
# overwrite relevant values with the reformatted ones
x[id] <- paste0(gsub("[0-9]", "", x)[id],
formatC(as.numeric(num[id]), width = 3, flag = "0"))
[1] "Y188" "Y204" "Y221" "EP121_1" "Y233" "Y248" "Y268" "BB002" "BB020" "BB032"
[11] "BB044" "BB056" "Y234" "Y249" "Y271" "BB003" "BB021" "BB033" "BB045" "BB057"
[21] "Y236" "Y250" "Y272" "BB004" "BB022"
It can be done using sprintf and gsub function.This step would extract numeric values and change its format.
num=sprintf("%03d",as.numeric(gsub("[^[:digit:]]", "", x)))
Next step would be to paste back numbers with changed format
x=paste(gsub("[^[:alpha:]]", "", x),num,sep="")

Finding number of r's in the vector (Both R and r) before the first u

rquote <- "R's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
in the above code we need to find the number of r's(R and r) in rquote
You could use substrings.
## find position of first 'u'
u1 <- regexpr("u", rquote, fixed = TRUE)
## get count of all 'r' or 'R' before 'u1'
lengths(gregexpr("r", substr(rquote, 1, u1), ignore.case = TRUE))
# [1] 5
This follows what you ask for in the title of the post. If you want the count of all the "r", case insensitive, then simplify the above to
lengths(gregexpr("r", rquote, ignore.case = TRUE))
# [1] 6
Then there's always stringi
library(stringi)
## count before first 'u'
stri_count_regex(stri_sub(rquote, 1, stri_locate_first_regex(rquote, "u")[,1]), "r|R")
# [1] 5
## count all R or r
stri_count_regex(rquote, "r|R")
# [1] 6
To get the number of R's before the first u, you need to make an intermediate step. (You probably don't need to. I'm sure akrun knows some incredibly cool regular expression to get the job done, but it won't be as easy to understand as this).
rquote <- "R's internals are irrefutably intriguing"
before_u <- gsub("u[[:print:]]+$", "", rquote)
length(stringr::str_extract_all(before_u, "(R|r)")[[1]])
You may try this,
> length(str_extract_all(rquote, '[Rr]')[[1]])
[1] 6
To get the count of all r's before the first u
> length(str_extract_all(rquote, perl('u.*(*SKIP)(*F)|[Rr]'))[[1]])
[1] 5
EDIT: Just saw before the first u. In that case, we can get the position of the first 'u' from either which or match.
Then use grepl in the 'chars' up to the position (ind) to find the logical index of 'R' with ignore.case=TRUE and use sum using the strsplit output from the OP's code.
ind <- which(chars=='u')[1]
Or
ind <- match('u', chars)
sum(grepl('r', chars[seq(ind)], ignore.case=TRUE))
#[1] 5
Or we can use two gsubs on the original string ('rquote'). First one removes the characters starting with u until the end of the string (u.$) and the second matches all characters except R, r ([^Rr]) and replace it with ''. We can use nchar to get count of the characters remaining.
nchar(gsub('[^Rr]', '', sub('u.*$', '', rquote)))
#[1] 5
Or if we want to count the 'r' in the entire string, gregexpr to get the position of matching characters from the original string ('rquote') and get the length
length(gregexpr('[rR]', rquote)[[1]])
#[1] 6

Convert hex to decimal in R

I found out that there is function called .hex.to.dec in the fBasics package.
When I do .hex.to.dec(a), it works.
I have a data frame with a column samp_column consisting of such values:
a373, 115c6, a373, 115c6, 176b3
When I do .hex.to.dec(samp_column), I get this error:
"Error in nchar(b) : 'nchar()' requires a character vector"
When I do .hex.to.dec(as.character(samp_column)), I get this error:
"Error in rep(base.out, 1 + ceiling(log(max(number), base =
base.out))) : invalid 'times' argument"
What would be the best way of doing this?
Use base::strtoi to convert hexadecimal character vectors to integer:
strtoi(c("0xff", "077", "123"))
#[1] 255 63 123
There is a simple and generic way to convert hex <-> other formats using "C/C++ way":
V <- c(0xa373, 0x115c6, 0xa373, 0x115c6, 0x176b3)
sprintf("%d", V)
#[1] "41843" "71110" "41843" "71110" "95923"
sprintf("%.2f", V)
#[1] "41843.00" "71110.00" "41843.00" "71110.00" "95923.00"
sprintf("%x", V)
#[1] "a373" "115c6" "a373" "115c6" "176b3"
As mentioned in #user4221472's answer, strtoi() overflows with integers larger than 2^31.
The simplest way around that is to use as.numeric().
V <- c(0xa373, 0x115c6, 0x176b3, 0x25cf40000)
as.numeric(V)
#[1] 41843 71110 95923 10149429248
As #MS Berends noted in the comments, "[a]lso notice that just printing V in the console will already print in decimal."
strtoi() has a limitation of 31 bits. Hex numbers with the high order bit set return NA:
> strtoi('0x7f8cff8b')
[1] 2139946891
> strtoi('0x8f8cff8b')
[1] NA
To get a signed value with 16 bits:
temp <- strtoi(value, base=16L)
if (temp>32767){ temp <- -(65535 - temp) }
In a general form:
max_unsigned <- 65535 #0xFFFF
max_signed <- 32767 #0x7FFF
temp <- strtoi(value, base=16L)
if (temp>max_signed){ temp <- -(max_unsigned- temp) }

numeric sort a list of strings in R

I have a list:
a <- ["12file.txt", "8file.txt", "66file.txt"]
I would like to sort by number:
a would be: ["8file.txt", "12file.txt", "66file.txt"]
Now I could get only this:
a = ["12file.txt", "66file.txt", "8file.txt"]
Thanks
I'm assuming you have a character vector:
a <- c("12file.txt", "8file.txt", "66file.txt")
I would approach this by pulling out the number at the start of each string and sorting on that:
num <- as.numeric(sub("([0-9]+).*", "\\1", a))
a[order(num)]
#[1] "8file.txt" "12file.txt" "66file.txt"
You could also pad your strings with spaces by setting a field length to sprintf to achieve the sorting you want:
a[order(sprintf("%10s",a))]
[1] "8file.txt" "12file.txt" "66file.txt"
You can use str_sort(..., numeric = TRUE) function from stringr package:
library(stringr)
a <- c("12file.txt", "8file.txt", "66file.txt")
str_sort(a, numeric = TRUE)
#> [1] "8file.txt" "12file.txt" "66file.txt"

How to format a number with specified level of precision?

I would like to create a function that returns a vector of numbers a precision reflected by having only n significant figures, but without trailing zeros, and not in scientific notation
e.g, I would like
somenumbers <- c(0.000001234567, 1234567.89)
myformat(x = somenumbers, n = 3)
to return
[1] 0.00000123 1230000
I have been playing with format, formatC, and sprintf, but they don't seem to want to work on each number independently, and they return the numbers as character strings (in quotes).
This is the closest that i have gotten example:
> format(signif(somenumbers,4), scientific=FALSE)
[1] " 0.000001235" "1235000.000000000"
You can use the signif function to round to a given number of significant digits. If you don't want extra trailing 0's then don't "print" the results but do something else with them.
> somenumbers <- c(0.000001234567, 1234567.89)
> options(scipen=5)
> cat(signif(somenumbers,3),'\n')
0.00000123 1230000
>
sprintf seems to do it:
sprintf(c("%1.8f", "%1.0f"), signif(somenumbers, 3))
[1] "0.00000123" "1230000"
how about
myformat <- function(x, n) {
noquote(sapply(a,function(x) format(signif(x,2), scientific=FALSE)))
}

Resources