Most significant decimal digit (or 0.3 - 0.1 = 0.1 ) [closed] - r

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
This operation should return 2, but it returns 1 instead because of the floating point representation:
a <- .3
b <- .1
floor((a-b)*10)
I basically want the first digit after the point, of the actual base-10 result, not the floating-point computer's result. In this case a and b only have one decimal digit, but in most situations there will be more. Examples:
0.3-0.1=0.2 so I want the 2
0.5-0.001=0.499 so I want the 4
0.925-0.113=0.812 so I want the 8
0.57-0.11=0.46 so I want the 4
0.12-0.11=0.01 so I want the 0
that is, not rounding but truncating. I thought of using this:
floor(floor((a-b)*100)/10)
but I'm not sure if that is the best I can do.
update: indeed, it doesn't work (see comments below):
floor(floor((.9-.8)*100)/10) # gives 0 instead of 1
floor(round((.5-.001)*100)/10) # gives 5 instead of 1
update 2: think this does work (at least in all cases listed so far):
substring(as.character(a-b),first=3,last=3)
Suggestions?

This is not possible, because the information is no longer there:
doubles cannot exactly represent decimal numbers.
If you are fine with an approximate solution,
you can add a small number, and truncate the result.
For instance, if you know that your numbers have at most 14 digits,
the following would work:
first_digit <- function(x, epsilon=5e-15)
floor( (x+epsilon) * 10 )
first_digit( .3 - .1 ) # 2
first_digit( .5 - .001 ) # 4
first_digit( .925 - .113 ) # 8
first_digit( .57 - .11 ) # 4
first_digit( .12 - .11 ) # 0
If you wanted the first significant digit (that means "first non-zero digit"),
you could use:
first_significant_digit <- function(x, epsilon=5e-14)
floor( (x+epsilon) * 10^-floor(log10(x+epsilon)) )
first_significant_digit(0.12-0.11) # 1

Related

0.5<0.5 returns TRUE in R? [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 2 years ago.
I came across a strange thing in R programming. When I simulate a sequence and want to judge whether the element is less than 0.5,
t=(1:1440)/1440
x=(t[720]-t[648])/0.1
x
#output:[1] 0.5
x<1/2
#output:[1] TRUE
x=0.5
x<1/2
#output:[1] FALSE
The two results are completely opposite and obviously the second result is what I want. Can anybody help me?
Floating point arithmetic is not exact in R, and the value you expect to be numerically exact to 0.5 may in fact be slightly more (or less). One possible workaround here would be to use rounding:
t <- (1:1440)/1440
x <- (t[720]-t[648]) / 0.1
round(x, 1) < 0.5

Reduce number of elements returned by lapply [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
As ?lapply states:
lapply returns a list of the same length as X, each element of which
is the result of applying FUN to the corresponding element of X.
Is it still possible to return a list with a smaller length than X?
Code
l <- lapply(1:10,function(u)ifelse(u<5,return(u),return(NULL)))
Can I place something in the return(NULL) part in order to drop/omit the element completely?
Desired Output
Output of the code section should be the same as:
l[!sapply(l,is.null)]
a list of 4 with only elements smaller 5!
Is it still possible to return a list with a smaller length than X?
Per the documentation quoted by the OP, the answer is "no, not unless you wrap lapply in another call that filters out the unwanted elements either before or after it."
There are many possible workarounds, but I might do ...
# example function
f = function(z) c(a = list(z+1), b = list(z-1), c = if (z > 3) list(z^2))
library(data.table)
data.table(x = 1:10)[x < 5, rbindlist(lapply(x, f), fill=TRUE)]
a b c
1: 2 0 NA
2: 3 1 NA
3: 4 2 NA
4: 5 3 16
... assuming the function returns a named list. If it just returns a scalar, try vectorizing or using sapply or vapply instead of lapply.

Floor not giving expected results when passed a calculation [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 5 years ago.
Why does only A out of A and B below equal 29? What is different about using a calculation as the x argument?
#A
floor(x = (1.45/0.05))
#B
floor(x = 29)
> #A
> floor(x = (1.45/0.05))
[1] 28
> #B
> floor(x = 29)
[1] 29
Like #bouncyball indicated it is a floating point problem.
If you go for example to this link and enter 1.45 or 0.05 you will notice that their binary representations are "infinitely" long (i.e. you can't write 1.45 with a finite string in binary).
Because your PC doesn't have infinite storage to store the digit he "chops" it off at some point - meaning he basically has your 1.45 as something like 1.49999999999 (nevermind the number of 9s) in his system. The same happens for 0.05.
Now your computer internally gets something like 28.9999999999999 - but he's not stupid when he outputs it. He knows, well 28.9999999999 is probably supposed to be 29 - so when he outputs it he just rounds. Except when you tell him explicitly to round using "floor" - then he rounds 28.99999999 to 28.
Hope that makes sense.

R - Equivalent inputs resulting in different outputs for a sequence [duplicate]

This question already has an answer here:
Why does the vector gets expanded in the loop
(1 answer)
Closed 6 years ago.
I am running into some behaviour with R that I find confusing. Does anyone have any insight into what is going on here?
Define two objects
i <- 5
nr <- 10
So i + 2 and nr + 1
> i+2
[1] 7
> nr+1
[1] 11
So to create a sequence from 7 to 11 I could do this:
7:11
But my question why does this not produce the same result?
i+2:nr+1
We already established above that it's input numbers are equivalent. Obviously I'm missing something here but I just don't know what it is.
You have just discovered the prime R gotcha, namely: 1:n-1 produces the sequence 0, 1, 2, ..., n-1.
To obtain what you desire, wrap the expressions in brackets:
1:(n-1)
or use
seq.int(1, n-1)
The reason for the issue is operator precedence - ?Syntax`

Filter out columns in R [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Referring to Post# Filtering out columns in R , the columns with all 1's and 0's were successfully eliminated from the training_data. However, the classification algorithm still complaint about the columns where MOST of the values are 0's except 1 or 2 (All the values in the column are 0 except 1 or 2 values).
I am using penalizedSVM R package to perform feature selection. Looking more closely at the data set, the function svm.fs complains about the columns where most of the values are 0 except a one or two.
How one can modify (or add something to) the following code to achieve the result.
lambda1.scad<-c(seq(0.01, 0.05, .01), seq(0.1, 0.5, 0.2), 1)
lambda1.scad<-lambda1.scad[2:3]
seed <- 123
f0 <- function(x) any(x!=1) & any(x!=0) & is.numeric(x)
trainingdata <- lapply(trainingdata, function(data) cbind(label=data$label,
colwise(identity, f0)(data)))
datax <- trainingdata[[1]]
levels(datax$label) <- c(-1, 1)
train_x<-datax[, -1]
train_x<-data.matrix(train_x)
trainy<-datax[, 1]
idx <- is.na(train_x) | is.infinite(train_x)
train_x[idx] <- 0
tryCatch(scad.fix<-svm.fs(train_x, y=trainy, fs.method="scad",
cross.outer=0, grid.search="discrete",
lambda1.set=lambda1.scad, parms.coding="none",
show="none", maxIter=1000, inner.val.method="cv",
cross.inner=5, seed=seed, verbose=FALSE), error=function(e) e)
Or one may propose an entirely different solution.
Use the fact that boolean values can be summed and define some tolerance of zeros:
sum(x == 0) / length(x) >= tolerance
Where this becomes your condition for dropping. However, often zeros are not only valid data, but are critical to the phenomenon being studied. You should think carefully about your algorithm choice and the decision to drop columns before going forward wit this approach.

Resources