Complex roots in 2nd polynomial - r

I am dealing with the roots of a seconf order polynomial and I only wnat to store the complex roots ( the ones that only have imaginary part). When I do:
Im(roots))
[1] -1.009742e-28 1.009742e-28
So the program says that is not equal to 0. And so the condition
Im(roots) ==0
Is never true. And I am storing all the roots that are only real also.
Thanks!

This is probably a case of FAQ 7.31 (dealing with representation and comparison of floating point numbers). The all.equal function is available in such cases. Best use would be
> isTRUE(all.equal(roots[1], 0) )
[1] TRUE
> isTRUE(all.equal(roots[2], 0) )
[1] TRUE
Read ?all.equal for all the gory details.

DWin is almost certainly right that you're getting numbers with magnitudes that small due to the imprecision of floating point arithmetic.
To correct for it in your application, you might want to use zapsmall(x, digits). zapsmall() is nice utility function that will round to 0 numbers that are very close to (within digits decimal places of) it.
Here, riffing off an example from its help page:
thetas <- 0:4*pi/2
coords <- exp(1i*thetas)
coords
# [1] 1+0i 0+1i -1+0i 0-1i 1-0i
## Floating point errors obscure the big picture
Im(coords) == 0
# [1] TRUE FALSE FALSE FALSE FALSE
Re(coords) == 0
# [1] FALSE FALSE FALSE FALSE FALSE
## zapsmall makes it all better
Im(zapsmall(coords)) == 0
# [1] TRUE FALSE TRUE FALSE TRUE
Re(zapsmall(coords)) == 0
# [1] FALSE TRUE FALSE TRUE FALSE

Related

How can all (`all`) be true while none (`any`) are true at the same time?

a <- character()
b <- "SO is great"
any(a == b)
#> [1] FALSE
all(a == b)
#> [1] TRUE
The manual describes ‘any’ like this
Given a set of logical vectors, is at least one of the values true?
So, not even one value in the comparison a == b yields TRUE.
If that is the case how can ‘any’ return FALSE while ‘all’ returns TRUE? ‘all’
is described as Given a set of logical vectors, are all of the values true?.
In a nutshell: all values are TRUE and none are TRUE at the same time?
I am not expert but that looks odd.
Questions:
Is there a reasonable explanation for or is it just some quirk of R?
What are the ways around this?
Created on 2021-01-08 by the reprex package (v0.3.0)
Usually, when comparing a == b the elements of the shorter vector are recycled as necessary. However, in your case a has no elements, so no recycling occurs and the result is an empty logical vector.
The results of any(a == b) and all(a == b) are coherent with the logical quantifiers for all and exists. If you quantify on an empty range, for all gives the neutral element for logical conjunction (AND) which is TRUE, while exists gives the neutral element for logical disjunction (OR) which is FALSE.
As to how avoid these situations, check if the vectors have the same size, since comparing vectors of different lengths rarely makes sense.
Regarding question number 2, I know of identical. It works well in all the situations I can think of.
a <- "a"
b <- "b"
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- character(0)
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- NA
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- NULL
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- b
identical(a, b) # TRUE >> works
#> [1] TRUE
identical seems to be a good workaround though it still feels like a workaround to a part-time developer like me. Are there more solutions? Better ones? And why does R behave like this in the first place (see question)?
Created on 2021-01-08 by the reprex package (v0.3.0)
Regarding question 1)
I have no idea whether I am correct, but here are my thoughts:
In R all() is the compliment of any(). For consistency, all(logical(0)) is true. So, you're situation you are capturing this unique case.
In mathematics, this is analogous to a set being both open and closed. I'm not a computer scientist, so I can't really talk to why one of the greybeards from way back when implemented this in either R or S.
regarding question 2)
I think the other responses have answered this well.
Another solution provided by the shiny package
is isTruthy().
The package introduced the concept of truthy/falsy that “generally indicates
whether a value, when coerced to a base::logical(), is TRUE or FALSE”
(see the documentation).
require(shiny, quietly = TRUE)
a <- "a"
b <- "b"
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- character(0)
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- NA
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- NULL
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- b
isTruthy(a == b) # TRUE >> works
#> [1] TRUE
One of the advantages is that you can use other operators like %in%
or match(), too.
The situation in R is that you never know what a function will return when it fails.
Some functions return NA, others NULL, and yet others vectors with length == 0.
isTruthy() makes it easier to handle the diversity.
Unfortunately, when one does not write a shiny app it hardly makes sense to load the package because - aside from isTruthy - shiny only adds a large bunch of unneeded Web App features.
Created on 2021-01-10 by the reprex package (v0.3.0)

Difference between expr(mean(1:10)) and expr(mean(!!(1:10))

Going through metaprogramming sections of Hadley's book Advanced R 2nd ed, I have quite a bit of a tough time understanding the concept. I have been programming with R for a while but this is the first time I came across the concept of metaprogramming. This exercise question in particular confuses me
"The following two calls print the same, but are actually different:
(a <- expr(mean(1:10)))
#> mean(1:10)
(b <- expr(mean(!!(1:10))))
#> mean(1:10)
identical(a, b)
#> [1] FALSE
What’s the difference? Which one is more natural?"
when I eval them they both returns the same
> eval(a)
[1] 5.5
> eval(b)
[1] 5.5
when I look inside the a and b object the second object does print differently, but I am not sure what does this mean in terms of their difference:
> a[[2]]
1:10
> b[[2]]
[1] 1 2 3 4 5 6 7 8 9 10
also if I just run them without eval(expr(...)) then it will return differently:
mean(1:10)
[1] 5.5
mean(!!(1:10))
[1] 1
My guess is that without expr(...) !!(1:10) act as a double negation which with coercion essentially forcing all the numbers to be 1, hence mean of 1.
My questions are:
Why does the !! acts differently with and without the expr(...) ? I would expect eval(expr(mean(!!(1:10)))) to return the same as mean(!!(1:10)) but this is not so
I still do not quite fully grasp what is the difference between a object and b object ?
thank you in advance
!! here is used not as double negation, but the unquote operator from rlang.
Unquoting is one inverse of quoting. It allows you to selectively
evaluate code inside expr(), so that expr(!!x) is equivalent to x.
The difference between a and b is that the argument remains as an unevaluated call in a, while it is evaluated in b:
class(a[[2]])
[1] "call"
class(b[[2]])
[1] "integer"
The a behaviour may be an advantage in some circumstances because it delays evaluation, or a disadvantage for the same reason. When it is a disadvantage, it is of the the cause of much frustration. If the argument was a larger vector, the size of b would increase, while a would stay the same.
See section 19.4 of Advanced R for more details.
Here is the difference. When we negate (!) an integer vector, numbers other than 0 are converted to FALSE and 0 to TRUE. With another negate ie. double (!!), the FALSE are changed to TRUE and viceversa
!0:5
#[1] TRUE FALSE FALSE FALSE FALSE FALSE
!!0:5
#[1] FALSE TRUE TRUE TRUE TRUE TRUE
With the OP's example, it is all TRUE
!!1:10
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
and TRUE/FALSE can be otherwise 1/0
as.integer(!!1:10)
#[1] 1 1 1 1 1 1 1 1 1 1
thus the mean would be 1
mean(!!1:10)
#[1] 1
Regarding the 'a' vs. 'b'
str(a)
#language mean(1:10)
str(b)
#language mean(1:10)
Both are language objects and it will be evaluated to get the mean of numbers 1:10
all.equal(a, b)
#[1] TRUE
If we need to get the mean of 10 numbers, the first one is the correct way.
We could evaluate the second option correctly i.e. getting a mean value of 1, by quoteing
eval(quote(mean(!!(1:10))))
#[1] 1
eval(quote(mean(1:10)))
#[1] 5.5
!! has special meaing when used inside expr.
outside expr you will get different results because
!! is a double negation
even inside expr the two versions are different because
1:10 is an expression resulting in an integer vector
when evaluated, while !!(1:10) is the result of
evaluating this same expression.
an expression and its result after it is evaluated are
different things.

How does ada::predict.ada work?

I tried on Cross-validated but without a response and this is a technical, implementation-centric question.
I used ada::ada in R to create a boosted model which is based on decision trees.
It normally returns a matrix with stats on predicted results compared to expected outcome.
It's something like that:
FALSE TRUE
FALSE 11023 1023
TRUE 997 5673
That's cool, good accuracy.
Now it's time to predict on new data. So I went with:
predict(myadamodel, newdata=giveinputs())
But instead of a simple answer TRUE/FALSE I've got:
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[49] FALSE FALSE
Levels: FALSE TRUE
I presume that this ada object is an ensamble and I received an answer from each classifier.
But in the end I need a final straight answer: TRUE/FALSE. If that's all I can get I need to know how does the "ada" function computes the final answer that was used to build the statistic. I would check that but the "ada" function is precompiled.
How do I get the final TRUE/FALS answer to comply with the statistic that ada return from the learning phase?
I've attached an example that you can copy-paste:
mydata = data.frame(a=numeric(0),b=double(0),r=logical(0))
for(i in -10:10)
for(j in 20:-4)
mydata[length(mydata[,1])+1,] = c(a=i,b=j, r= (j > i))
myada = ada(mydata[,c("a","b")], mydata[,"r"])
print(myada);
predict(myada, data.frame(a=4,b=7))
Please note that the r-column is for some reason expressed as "0" "1". I don't know why and how to tell data.frame not to convert TRUE FALSE to 0, 1 but the idea stays the same.
OK. The reproducible example helped. It looks to be a quirk in the way predict works when you pass new data that has just one row. In this case, you're getting an estimate from each of the iterations (the default number of iterations is 50). Note that you only get two values returned when you do
predict(myada, data.frame(a=4:3,b=7:8))
This is basically because of a use of sapply within the predict function. We can make our own which doesn't have this problem.
predict.ada <- ada:::predict.ada
body(predict.ada)[[12]] <- quote( tmp <- t(do.call(rbind,
lapply(1:iter, function(i) f(f = object$model$trees[[i]],
dat = newdata)))))
and then we can run
predict.ada(myada, newdata=data.frame(a=4,b=7))
# [1] TRUE
# Levels: FALSE TRUE
so this new values is predicted to be TRUE. This was tested in ada_2.0-3 and may break in other versions.
Also, in your test data, when you use c() to merge elements they must be all the same data type or they will be converted to the lowest common denominator data type that can hold all values. If you're mixing types, it's better to use a list(). For example
mydata[length(mydata[,1])+1,] = list(a=i,b=j, r= (j > i))

Partial string matching with grep and regular expressions

I have a vector of three character strings, and I'm trying to write a command that will find which members of the vector have a particular letter as the second character.
As an example, say I have this vector of 3-letter stings...
example = c("AWA","WOO","AZW","WWP")
I can use grepl and glob2rx to find strings with W as the first or last character.
> grepl(glob2rx("W*"),example)
[1] FALSE TRUE FALSE TRUE
> grepl(glob2rx("*W"),example)
[1] FALSE FALSE TRUE FALSE
However, I don't get the right result when I trying using it with glob2rx(*W*)
> grepl(glob2rx("*W*"),example)
[1] TRUE TRUE TRUE TRUE
I am sure my understanding of regular expressions is lacking, however this seems like a pretty straightforward problem and I can't seem to find the solution. I'd really love some assistance!
For future reference, I'd also really like to know if I could extend this to the case where I have longer strings. Say I have strings that are 5 characters long, could I use grepl in such a way to return strings where W is the third character?
I would have thought that this was the regex way:
> grepl("^.W",example)
[1] TRUE FALSE FALSE TRUE
If you wanted a particular position that is prespecified then:
> grepl("^.{1}W",example)
[1] TRUE FALSE FALSE TRUE
This would allow programmatic calculation:
pos= 2
n=pos-1
grepl(paste0("^.{",n,"}W"),example)
[1] TRUE FALSE FALSE TRUE
If you have 3-character strings and need to check the second character, you could just test the appropriate substring instead of using regular expressions:
example = c("AWA","WOO","AZW","WWP")
substr(example, 2, 2) == "W"
# [1] TRUE FALSE FALSE TRUE

Lazy evaluation of `which` function arguments?

If there are multiple boolean expressions as arguments to the which function, are they evaluated lazily?
For example:
which(test1 & test2)
If test1 returns false, then test2 is not evaluated as the compound expression will be false anyway.
With if there can be efficiency gains as a result of that behavior. It is documented to work that way, and I don't think it is due to lazy evaluation. Even if you "force()-ed" that expression it would still only evaluate a series of &'s until it had a single FALSE. See this help page:
?Logic
#XuWang probably deserved the credit for emphasizing the difference between "&" and "&&". The "&" operator works on vectors and returns vectors. The "&&" operator acts on scalars (actually vectors of length==1) and returns a vector of length== 1. When offered a vector or length >1 as either side of the arguments, it will work on only the information in the first value of each and emit a warning. It is only the "&&" version that does what is being called "lazy" evaluation. You can see that hte "&" operator is not acting in a "lazy fashion with a simepl test:
fn1 <- function(x) print(x)
fn2 <- function(x) print(x)
x1 <- sample(c(TRUE, FALSE), 10, replace=TRUE)
fn1(x1) & fn2(x1) # the first two indicate evaluation of both sides regardless of first value
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE

Resources