I'm trying to do a simple sum over a large column in R. The answer comes back all right, but not to the specificity that I want. For example:
> tail(x)
[,1]
[1999995,] 1999995
[1999996,] 0
[1999997,] 1999997
[1999998,] 0
[1999999,] 1999999
[2e+06,] 0
If I do a sum(x), I get:
> sum(x)
[1] 1e+12
Which is fine, but I'd like it to print out something with more significant figures like 158683269821 or something. Is there an option in sum() to specify how many sigfigs I want?
The options I wound up using were thus:
> options("scipen"=100, "digits"=4)
> sum(x)
[1] 1000000000000
> sum(x)
[1] 1000000000000
> sum(x)+1
[1] 1000000000001
> sum(x)+2
[1] 1000000000002
> sum(x)-1
[1] 999999999999
Related
I am doing a simple operation of multiplying a decimal number and converting it to integer but the result seems to be different than expected. Apologies if this is discussed else where, I am not able to find any straight forward answers to this
> as.integer(1190.60 * 100)
[1] 119059
EDIT:
So, I have to convert that to character and then do as.integer to get what is expected
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> class(temp)
[1] "numeric"
> class(temp2)
[1] "numeric"
> as.character(temp2)
[1] "119060"
> as.integer(temp2)
[1] 119059
> as.integer(as.character(temp2))
[1] 119060
EDIT2: According to the comments, thanks #andrey-shabalin
> temp2
[1] 119060
> as.integer(temp2)
[1] 119059
> as.integer(round(temp2))
[1] 119060
EDIT3: As mentioned in the comments the question is related to behaviour of as.integer and not about floating calculations
The answer to this is "floating point error". You can see this easily by checking the following:
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> temp2 - 119060
[1] -1.455192e-11
Due to floating point errors, temp2 isn't exactly 119060 but :
> sprintf("%.20f", temp2)
[1] "119059.99999999998544808477"
If you use as.integer on a float, it works the same way as trunc, i.e. it does round the float in the direction of 0. So in this case that becomes 119059.
If you convert to character using as.character(), R will make sure that it uses maximum 15 significant digits. In this example that would be "119059.999999999". The next digit is another 9, so R will round this to 119060 before conversion. I avoid this in the code above by using sprintf() instead of as.character().
I have a for loop calculating fields in a dataframe, averaging out the difference over missing data over several rows. I used:
for(l in i:k-1){data[l,j]=as.numeric(data[l-1,j])+increase}
intending to change the fields from i to k-1
what happens is that fields from i-1 to k get changed -- is this what R should do?
I appreciate that I can get the results I really want by enclosing the k-1 in brackets (as I did in the first for loop below), but don't understand why R is interpreting my i:k-1 as i-1:k (see the second loop).
Example:
> data[l,j]=0
> data[l-1,j]=0
> data[l+1,j]=0
> increase
[1] -2.8
> i
[1] 11019
> k
[1] 11020
> for(l in i:(k-1)){data[l,j]=as.numeric(data[l-1,j])+increase}
> data[l-1,j]
[1] "0"
> data[l+1,j]
[1] "0"
> data[l,j]
[1] "-2.8"
> data[l,j]=0
> for(l in i:k-1){data[l,j]=as.numeric(data[l-1,j])+increase}
> data[l,j]
[1] "7.7"
> data[l+1,j]
[1] "0"
> data[l-1,j]
[1] "10.5"
> data[l-2,j]
[1] "11.1"
I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!
You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same
Is there a way to use tidyr's extract_numeric() to extract negative numbers?
For example,
> extract_numeric("2%")
[1] 2
> extract_numeric("-2%")
[1] 2
I'd really like the second call to return -2.
Bill
PS: While it doesn't concern me today, I suspect cases such as "-$2.00" complicate any general solution.
extract_numeric is pretty simple:
> extract_numeric
function (x)
{
as.numeric(gsub("[^0-9.]+", "", as.character(x)))
}
<environment: namespace:tidyr>
It just replaces any char that isn't 0 to 9 or "." with nothing. So "-1" will become 1, and there's nothing you can do about it... except maybe file an enhancement request to tidyr, or write your own...
extract_num = function(x){as.numeric(gsub("[^0-9\\-]+","",as.character(x)))}
will sort of do it:
> extract_num("-$1200")
[1] -1200
> extract_num("$-1200")
[1] -1200
> extract_num("1-1200")
[1] NA
Warning message:
In extract_num("1-1200") : NAs introduced by coercion
but a regexp could probably do better, only allowing minus signs at the start...
Just use sub if there's a single number in the string. Here's an approach:
The function:
myfun <- function(s) as.numeric(sub(".*?([-+]?\\d*\\.?\\d+).*", "\\1", s))
Examples:
> myfun("-2%")
[1] -2
> myfun("abc 2.3 xyz")
[1] 2.3
> myfun("S+3.")
[1] 3
> myfun(".5PPP")
[1] 0.5
I just trying to round in R number like:
> round(1.327076e-09)
I would like it to result in
> 1.33e-09
but results in
> 0
which function can use?
Try signif:
> signif(1.326135235e-09, digits = 3)
[1] 1.33e-09
Use signif:
x <- 1.327076e-09
signif(x,3)
[1] 1.33e-09
or sprintf:
sprintf("%.2e",x)
[1] "1.33e-09"
The function round will do rounding and you can specify the number of decimals:
x <- 1.327076e-09
round(x, 11)
[1] 1.33e-09
Rising to the challenge set by #Joris and #GavinSimpson - to use trunc on this problem, do the following:
library(plyr)
round_any(x, 1e-11, floor)
[1] 1.32e-09