Understanding base::identical() -- why is it "safer"?

Understanding base::identical() -- why is it "safer"? - r

I have always heard that identical() is "safer" than ==. But playing around with them both it seems like the desired result is obtained by the latter and not the former. I'm assuming this has something to do with the attributes of the object generated by nrow() but it would be nice with a clarification as to why this happens in the example below:
> identical(3653510+2751069, 6404579)
[1] TRUE
> identical((3653510+nrow(df)), nrow(df2))
[1] FALSE
> nrow(df)
[1] 2751069
> nrow(df2)
[1] 6404579
> nrow(df2)-nrow(df)
[1] 3653510
> (3653510+nrow(df))==nrow(df2)
[1] TRUE
So to summarize, I have two questions:
What's going on with identical()?
What do we mean be saying a function is "safer" to use in this context?

It could be an attribute difference. e.g.
n1 <- 2751069
n2 <- 6404579
identical((3653510 + n1), n2)
#[1] TRUE
Both are numeric class. If we change one of them to integer, it returns FALSE
n2 <- 6404579L
identical((3653510 + n1), n2)
#[1] FALSE
With all.equal, we can also specify check.attributes = FALSE
all.equal((3653510 + n1), n2, check.attributes = FALSE)
#[1] TRUE

Related

Is there a better way to check if all elements in a list are named?

I want to check if all elements in a list are named. I've came up with this solution, but I wanted to know if there is a more elegant way to check this.
x <- list(a = 1, b = 2)
y <- list(1, b = 2)
z <- list (1, 2)
any(stringr::str_length(methods::allNames(x)) == 0L) # FALSE, all elements are
# named.
any(stringr::str_length(methods::allNames(y)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
any(stringr::str_length(methods::allNames(z)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.

I am not sure if the following base R code works for your general cases, but it seems work for the ones in your post.
Define a function f to check the names
f <- function(lst) length(lst) == sum(names(lst) != "",na.rm = TRUE)
and you will see
> f(x)
[1] TRUE
> f(y)
[1] FALSE
> f(z)
[1] FALSE

We can create a function to check if the the names attribute is NULL or (|) there is blank ("") name, negate (!)
f1 <- function(lst1) is.list(lst1) && !(is.null(names(lst1))| '' %in% names(lst1))
-checking
f1(x)
#[1] TRUE
f1(y)
#[1] FALSE
f1(z)
#[1] FALSE
Or with allNames
f2 <- function(lst1) is.list(lst1) && !("" %in% allNames(lst1))
-checking
f2(x)
#[1] TRUE
f2(y)
#[1] FALSE
f2(z)
#[1] FALSE

Checking for sequences in an R vector

I'm looking for a function or operation such that if I have
A <- c(1, 2, 3, 4, 5)
and
B <- c(1, 2, 3)
and C <- c(2, 1)
I'd get a TRUE when checking whether A contained B, and FALSE when checking whether A contained C
basically, the equivalent of the %in% operator but that actually cares about the order of elements
In a perfect world, I'd be able to do this without some kind of apply statement, but I may end up having to

Well, if one's allowd to use a kind-of apply loop, then this could work:
"%seq_in%" = function(b,a) any(sapply(1:(length(a)-length(b)+1),function(i) all(a[i:(i+length(b)-1)]==b)))
(edited thanks to bug-finding by John Coleman!)
EDIT 2:
I couldn't resist trying to solve the 'non-contiguous' case, too:
# find_subseq() returns positions within vec of ordered elements of x, or stops with NA upon failing
find_subseq = function(x,vec) {
p=match(x[1],vec)
if(is.na(p)||length(x)==1){ p }
else { c(p,p+find_subseq(x[-1],vec[-seq_len(p)])) }
}
"%seq_somewhere_in%" = function(b,a) all(!is.na(find_subseq(b,a)))
Examples:
1:3 %seq_in% 1:10
[1] TRUE
c(3,1,2) %seq_in% 1:10
[1] FALSE
c(1,2,3) %seq_in% c(3,2,1,2,3)
[1] TRUE
2:1 %seq_in% c(1,2,1)
[1] TRUE
1:3 %seq_somewhere_in% c(1,10,10,2,10,10,10,3,10)
[1] TRUE

Maybe you can define a custom function subseq_check like below
subseq_check <- function(x,y) grepl(toString(y),toString(x),fixed = TRUE)
which gives
> subseq_check(A,B)
[1] TRUE
> subseq_check(A,C)
[1] FALSE
A Hard-core approach
subseq_find <- function(x,y) {
inds <- which(x == head(y,1))
if (length(inds)==0) return(FALSE)
any(sapply(inds, function(k) all(x[k:(k+length(y)-1)]==y)))
}
such that
> subseq_find(A,B)
[1] TRUE
> subseq_find(A,C)
[1] FALSE

Take "a" and "b" as being equal if they are the same in their first 4 decimal places in R

I was wondering how I could have R take any two objects such as "a" and "b" (shown below) as being equal if they are EXACTLY the same in their first 4 decimal places?
P.S. Thus, I'm asking how I could make a conditional command out of this question.
a = 1.234574789
b = 1.234565638

We can check the first 6 places, with substr
substr(a, 1, 6)== substr(b, 1, 6)
Or with sprintf
sprintf("%0.4f", a) == sprintf("%0.4f", b)
#[1] TRUE
Can create a function with this
f1 <- function(v1, v2) {
sprintf("%0.4f", v1) == sprintf("%0.4f", v2)
}
f1(a, b)
#[1] TRUE
f1(1.2345, 1.2346)
#[1] FALSE
and if we need to round, then use round
round(a, 4)
#[1] 1.2346
round(b, 4)
#[1] 1.2346
Also, the last solution can be also compared

If you want to see how to code a custom function:
> equal <- function(a, b, sig=4) { return (round(a,sig) == round(b,sig)) }
> equal(1.23456, 1.23457)
[1] TRUE
> equal(1.23456, 1.23557)
[1] FALSE

If you want to check if a and b are nearly equal within a certain tolerance, you can use all.equal
a = 1.234574789
b = 1.234565638
a - b
#[1] 9.151e-06
all.equal(a, b, tolerance = 1e-4)
#[1] TRUE
all.equal(a, b, tolerance = 1e-5)
#[1] TRUE
all.equal(a, b, tolerance = 1e-6)
#[1] "Mean relative difference: 7.41226865e-06"

check whether a variable is in increasing order in R

Supposing I have a variable
x <- c(1,3,5,7,8)
Now x is in increasing order
How to check whether a variable is in increasing order in R?

From ?is.unsorted:
Test if an object is not sorted (in increasing order) ...
So, in this case, you could:
is.sorted = Negate(is.unsorted)
is.sorted(x)
#[1] TRUE
#> is.sorted(1:5)
#[1] TRUE
#> is.sorted(5:1)
#[1] FALSE
#> is.sorted(sample(5))
#[1] FALSE
#> is.sorted(sort(runif(5)))
#[1] TRUE
#> is.sorted(c(1,2,2,3))
#[1] TRUE
#> is.sorted(c(1,2,2,3), strictly = T)
#[1] FALSE
This function is fast, because it loops over the vector and breaks the loop as soon as an element is not ">=" (or ">", if "strictly = T") from the previous one.

Try this:
all(diff(x) > 0)
or
all(diff(x) >= 0)
I agree with #flodel that is.unsorted (h/t #alexis_laz) is probably even better.

Look at the differences:
R> x <- c(1,3,5,7,8)
R> allIncreasing <- function(x) all(diff(x)>0)
R> allIncreasing(x)
[1] TRUE
R> y <- x; y[3] <-0
R> allIncreasing(y)
[1] FALSE
R>

Reduce with less than symbol

I never think to use Reduce but I have a problem I thought it would be good for. I want to make sure the size of each iterative element of a vector is equal to or larger than the previous element. I can do this with sapply but my attempt with Reduce fails. How can I use this with Reduce?
#This works
y <- c(1,2,3,2,4,4)
sapply(seq_along(y)[-length(y)], function(i) y[i] <= y[i+1])
#attempts
Reduce('<', c(1,2,3,2,4,4)), accumulate = TRUE)
Reduce('<', c(1,2,3,2,4,4)))

The diff() function would be a logical choice here (others having explained nicely why Reduce() is not appropriate). It is already set up to compare the differences between elements of a vector and is already vectorised.
> !diff(y) < 0
[1] TRUE TRUE FALSE TRUE TRUE

Desparately bored? I was:
myFun <- function(x,z){
if(is.null(names(z))) names(z) <- z
if(is.null(names(x))) names(x) <- x
if(as.numeric(names(x)) < as.numeric(names(z))) res <- TRUE else res <- FALSE
names(res) <- names(z)
return(res)
}
as.logical(Reduce(myFun, y, accumulate = TRUE)[-1])
# [1] TRUE TRUE FALSE TRUE TRUE

It is my understanding from ?Reduce that Reduce compares the first and second element. Since 1 < 2 returns 1. It will reuse 1 and then compare it to the third element and so on. This means you will always compare 1 < y[3:length(y)] which turns out to be always true. Alternatively you could try:
head(y,-1) < tail(y, -1)

I don't think it can be used as Reduce will in general end up with something like f(f(x[1],x[2]),x[3]), so your comparison for the third element will be TRUE < 3.
identical(y,sort(y))
would appear to be a more efficient solution for this problem.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Understanding base::identical() -- why is it "safer"? - r

Related

Is there a better way to check if all elements in a list are named?

Checking for sequences in an R vector

Take "a" and "b" as being equal if they are the same in their first 4 decimal places in R

check whether a variable is in increasing order in R

Reduce with less than symbol

Categories

Resources