Remove all leading zeros and turn number into a positive - r

I'm trying to convert .000278 into 278 in R but all functions I see can only move it to .278. It needs to be a whole, positive number.
Is there a function that will remove the .0+ preceding a number??

I assume you want to apply this to many numbers at once (otherwise I might not understand the question).
a <- c(0.003, 0.0056, 0.000278)#Example data
a*(10^(nchar(a)-2))
[1] 3 56 278
Make sure scientific notation is disabled with scipen, as discussed in this post (e.g., by doing options(scipen=999). This is important because nchar counts the number of characters the numbers have (minus 2 for "0.").
Another approach. Use the package stringr.
a <- c(0.003, 0.0056, 0.000278)#Example data
library(stringr)
as.numeric(str_replace(a, "0.", ""))
[1] 3 56 278
Note that with this method you need to convert the output of str_replace back to numeric using as.numeric (not ideal).

Or use
substr and regexpr what gives exactly what you wanted
x <- 0.000278
as.numeric(substr(x ,regexpr("[^0.]",x),nchar(x)))
[1] 278
And this also works for different numbers, just set:
options("scipen"=100, "digits"=10) # Force R not to use exponential notation (e.g. e-8)
and you could try for example:
z <- 0.000000588
as.numeric(substr(z ,regexpr("[^0.]",z),nchar(z)))
[1] 588

Try this (len is adjustable):
a <- 0.000278
a*(10^match(T,round(a, 1:(len=10)) == a)) # counts the number of decimals places
# 278
Use as.numeric(a) in case a is of type character.

> as.numeric(gsub("^[0.]*", "", paste(0.000278)))
[1] 278

Related

How to make R display 19 digit number as it is stored?

I have a dataset with a key column which is basically a 19 digit integer.
I'm using tibbles so I use options(pillar.sigfig = 22) to display larger numbers and not scientific notation.
Problem is, I notice that the number stored in the column and the one that is displayed are slightly different, to be specific last 3 digits are different.
E.g
options(pillar.sigfig = 22)
x <- 1099324498500011011
But when I try to return the number I get 1099324498500011008.
I'm not sure why R would change the last 3 digits and since it is a key, it makes my data unusable for analysis.
I have tried the usual options(scipen = 999) for suppressing scientific notation but it does not seem to work on tibbles.
How do I get the same 19 digit number as I intend to store it?
Sorry to be bearer of bad news but R only has
a numeric type (double) using 64 bits and approximately sixteen decimals precision
an integer type (int) using 32 bits
There is nothing else. You may force the print function to show you nineteen digits but that just means ... you are looking at three digits of randomness.
19 digits for (countable) items are common, and often provided by (signed or unsigned) int64_t types. Which R does not have natively but approximates via the integer64 call in the bit64 package.
So the following may be your only workaround:
> suppressMessages(library(bit64))
> x <- as.integer64("123456790123456789")
> x
integer64
[1] 123456790123456789
> x - 1
integer64
[1] 123456790123456788
>
The good news is that integer64 is reasonably well supported by data.table and a number of other packages.
PS It really is 19 digits where it bites:
> as.integer64(1.2e18) + 1
integer64
[1] 1200000000000000001
> as.integer64(1.2e19) + 1
integer64
[1] <NA>
Warning message:
In as.integer64.double(1.2e+19) : NAs produced by integer64 overflow
>

How to sort numbers in a column based on the entire number, not just the leading digit, in base R?

I have a column of numbers that I want to sort, with the largest number on top and then in descending order. I tried using sort(), but the result places 900 as a larger number than 1000. Basically it's sorting based on the first digit in a number. I don't want that. I want it to sort based on the entire number. Is there a way to do this in base R, without using any library like dplyr.
I've tried multiple ways and lots of googling, and I'm surprised I couldn't find a way to do this. Maybe I'm just going nuts.
SOLUTION: As usual, one more googling attempt after I post and I found a solution. Turns out the class of the column was 'character'. I used as.numeric() to change it to numeric values, then I used sort and it did what I wanted. I'm keeping this post up anyway for anyone in the future that may need it.
This is the sort of thing that would happen if R was treating the number as a character string, as below.
set.seed(1431)
x <- round(runif(10, 0, 25000))
x
# [1] 10799 19832 14455 7657 4517 743 4922 13462 22738
# [10] 19636
xc <- as.character(x)
sort(x, decreasing=TRUE)
# [1] 22738 19832 19636 14455 13462 10799 7657 4922 4517
# [10] 743
sort(xc, decreasing=TRUE)
# [1] "7657" "743" "4922" "4517" "22738" "19832" "19636"
# [8] "14455" "13462" "10799"

number some patterns in the string using R

I have a strings and it has some patterns like this
my_string = "`d#k`0.55`0.55`0.55`0.55`0.55`0.55`0.55`0.55`0.55`n$l`0.4`0.1`0.25`0.28`0.18`0.3`0.17`0.2`0.03`!lk`0.04`0.04`0.04`0.04`0.04`0.04`0.04`0.04`0.04`vnabgjd`0.02`0.02`0.02`0.02`0.02`0.02`0.02`0.02`0.02`pogk(`1.01`0.71`0.86`0.89`0.79`0.91`0.78`0.81`0.64`r!#^##niw`0.0014`0.0020`9.9999`9.9999`0.0020`0.0022`0.0032`9.9999`0.0000`
As you can see there is patterns [`nonnumber] then [`number.num~] repeated.
So I want to identify how many [`number.num~] are between [`nonnumber].
I tried to use regex
index <- gregexpr("`(\\w{2,20})`\\d\\.\\d(.*?)`\\D",cle)
regmatches(cle,index)
but using this code, the [`\D] is overlapped. so just It can't number how many the pattern are.
So if you know any method about it, please leave some reply
Using strsplit. We split at the backtick and count the position difference which of the values coerced to "numeric" yield NA. Note, that we need to exclude the first element after strsplit and add an NA at the end in the numerics. Resulting in a vector named with the non-numerical element using setNames (not very good names actually, but it's demonstrating what's going on).
s <- el(strsplit(my_string, "\\`"))[-1]
s.num <- suppressWarnings(as.numeric(s))
setNames(diff(which(is.na(c(s.num, NA)))) - 1,
s[is.na(s.num)])
# d#k n$l !lk vnabgjd pogk( r!#^##niw
# 9 9 9 9 9 9

Loss of decimal places when calculating mean in R

I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic

R - strtoi strange behavior to get week of year

I use strtoi to determine the week of year in the following function:
to.week <- function(x) strtoi(format(x, "%W"))
It works fine for most dates:
> to.week(as.Date("2015-01-11"))
[1] 1
However, when I'm trying dates between 2015-02-23 and 2015-03-08, I get NA as a result:
> to.week(as.Date("2015-02-25"))
[1] NA
Could you please explain to me what causes the problem?
Here is an implementation that works:
to.week <- function(x) as.integer(format(x, "%W"))
The reason strtoi fails is by default it tries to interpret numbers as if they were octal when they are preceeded by a "0". Since "%W" returns "08", and 8 doesn't exist in octal, you get the NA. From ?strtoi:
Convert strings to integers according to the given base using the C function strtol, or choose a suitable base following the C rules.
...
For decimal strings as.integer is equally useful.
Also, you can use:
week(as.Date("2015-02-25"))
Though you may have to offset the result of that by 1 to match your expectations.
you can slightly modify your code like this
to.week <- function(x) strtoi(format(x, "%W"), 10)
and use base 10.

Resources