Main question
In what practical programming situations or R "idioms" would you only want to check the first element of each of two vectors for logical comparison? (I.e. disregarding the rest of each vector as in && and ||.)
I can see the use of & and | in R, where they do element-wise logical comparison of two vectors. But I cannot see a real life practical use of their sibling operators && and ||. Can anyone provide a clear example of their use?
The documentation ,help("&&"), says:
The longer form evaluates left to right examining only the first element of each vector.
Evaluation proceeds only until the result is determined.
The longer form is appropriate for
programming control-flow and typically preferred in if clauses.
The issue for me is the following: I interpret the documentation of && and || to say that for logical vectors x and y, the && and || operators only use x[1] and y[1] to provide a result.
> c(TRUE, FALSE, FALSE) && c(TRUE, FALSE)
[1] TRUE
> c(TRUE, FALSE, FALSE) && c(FALSE, FALSE)
[1] FALSE
> c(FALSE, FALSE, FALSE) && c(TRUE, FALSE)
[1] FALSE
> c(FALSE, FALSE, FALSE) && c(FALSE, FALSE)
[1] FALSE
I don't see any "programming control-flow" situations where I would have two logical vectors and I would disregard any values past the first element of each.
It seems that x && y acts like x[1] & y[1], and x || y acts like x[1] | y[1].
Benchmarks
Here's a test function that evaluates how often these formulations return the same result using randomly generated logical vectors of different lengths. This suggests that they are doing the same thing.
> test <- function( n, maxl=10 ) {
foo <- lapply( X=seq_len( n ), FUN=function(i) {
x <- runif( n=sample( size=1, maxl ) ) > 0.5
y <- runif( n=sample( size=1, maxl ) ) > 0.5
sameres <- all.equal( (x||y), (x[1]|y[1]) )
sameres
} )
table( unlist( foo ) )
}
test( 10000 )
Yields:
TRUE
10000
Here's a benchmarking test on which is faster. It start by creating a list of lists, where each of N items in dat is a list containing two randomly generated logical vectors. Then we apply each of the variants on the same data to see which is faster.
library(rbenchmark)
N <- 100
maxl <- 10
dat <- lapply( X=seq_len(N), FUN=function(i) {
list( runif( n=sample( size=1, maxl ) ) > 0.5,
runif( n=sample( size=1, maxl ) ) > 0.5) } )
benchmark(
columns=c("test","replications","relative"),
lapply(dat, function(L){ L[[1]] || L[[2]] } ),
lapply(dat, function(L){ L[[1]][1] | L[[2]][1] } )
)
Yields the following output (removed the \n characters and extra whitespace):
test replications relative
2 lapply(dat, function(L) { L[[1]][1] | L[[2]][1] }) 100 1.727
1 lapply(dat, function(L) { L[[1]] || L[[2]] }) 100 1.000
Clearly, the || formulation is faster than cherry picking the first element of each argument. But I'm still curious as to why one would need such an operator.
I guess that there are a couple of reasons, but probably the most important one is the short-circuit behavior. If a evaluates to FALSE in a && b, then b is not evaluated. Similarly, if a evaluates to TRUE in a || b, then b is not evaluated. This allows writing code like
v <- list(1, 2, 3, 4, 5)
idx <- 6
if (idx < length(v) && v[[idx]] == 5) {
foo
} else {
bar
}
Otherwise one needs to write this (maybe) as
if (idx < length(v)) {
if (v[idx] == 5) {
foo
} else {
bar
}
} else {
bar
}
which is 1) much less readable, and 2) repeats bar, which is bad if bar is a bigger piece of code.
You cannot use & in the if condition, because your index would be out of bounds, and this is not allowed for lists in R:
if (idx < length(v) & v[[idx]] == 5) {
foo
} else {
bar
}
# Error in v[[idx]] : subscript out of bounds
Here is a small illustration of the short-circuit behavior:
t <- function() { print("t called"); TRUE }
f <- function() { print("f called"); FALSE }
f() && t()
# [1] "f called"
# [1] FALSE
f() & t()
# [1] "f called"
# [1] "t called"
# [1] FALSE
t() || f()
# [1] "t called"
# [1] TRUE
t() | f()
# [1] "t called"
# [1] "f called"
# [1] TRUE
Related
I want to check if all elements in a list are named. I've came up with this solution, but I wanted to know if there is a more elegant way to check this.
x <- list(a = 1, b = 2)
y <- list(1, b = 2)
z <- list (1, 2)
any(stringr::str_length(methods::allNames(x)) == 0L) # FALSE, all elements are
# named.
any(stringr::str_length(methods::allNames(y)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
any(stringr::str_length(methods::allNames(z)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
I am not sure if the following base R code works for your general cases, but it seems work for the ones in your post.
Define a function f to check the names
f <- function(lst) length(lst) == sum(names(lst) != "",na.rm = TRUE)
and you will see
> f(x)
[1] TRUE
> f(y)
[1] FALSE
> f(z)
[1] FALSE
We can create a function to check if the the names attribute is NULL or (|) there is blank ("") name, negate (!)
f1 <- function(lst1) is.list(lst1) && !(is.null(names(lst1))| '' %in% names(lst1))
-checking
f1(x)
#[1] TRUE
f1(y)
#[1] FALSE
f1(z)
#[1] FALSE
Or with allNames
f2 <- function(lst1) is.list(lst1) && !("" %in% allNames(lst1))
-checking
f2(x)
#[1] TRUE
f2(y)
#[1] FALSE
f2(z)
#[1] FALSE
Assume you have an input a which goes into an existing function fun. I am looking for a function preserved(a, fun(a)), which returns TRUE, if the type is unchanged, FALSE otherwise.
Example:
a <- 1L # [1] 1 - integer
b <- a[FALSE] # integer(0)
c <- a[2] # [1] NA
d <- ncol(a) # NULL
e <- a/0 # [1] Inf
f <- 1 # [1] 1 - numeric
g <- as.factor(a) # [1] 1 Levels: 1
Expected output:
preserved(a, 2L) = TRUE
preserved(a, b) = FALSE
preserved(a, c) = FALSE
preserved(a, d) = FALSE
preserved(a, e) = FALSE
preserved(a, f) = FALSE
preserved(a, f) = FALSE
preserved(a, g) = FALSE
A bad hack (not vectorized) would be
preserved <- function(a, b) {
if (length(b) == length(a)) {
if (is.na(b) == is.na(a) &
class(b) == class(a) &
is.null(b) == is.null(a) &
is.nan(b) == is.nan(a) &
is.factor(b) == is.factor(a)) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(FALSE)
}
}
If you just want to compare two objects, you probably want to use all.equal() or identical() rather than trying to generate every possible pairwise combination of classes (since that number could be infinite).
Something close to what you want that might be more useful is applying makeActiveBinding() to issue messages (or warnings or errors) if type coercion is attempted:
# active binding
preserved <- local( {
x <- NULL
function(v) {
if (!missing(v)) {
if (class(x) != class(v)) {
message(sprintf("Object is being coerced from %s to %s", class(x), class(v)))
}
x <<- v
}
x
}
})
makeActiveBinding("z", preserved, .GlobalEnv)
z
## NULL
z <- 2
## Object is being coerced from NULL to numeric
z <- "hello"
## Object is being coerced from numeric to character
z <- factor("a", levels = c("a", "b", "c"))
## Object is being coerced from character to factor
z
## [1] a
## Levels: a b c
I wondering why my first if statement returns Error when my input data is an object of class numeric?
I have clearly stated for the first if statement to only turn on IF the data class is "data.frame", but when data class is numeric, this first if statement return an error! am I missing anything here?
Update:
I have changed instances of & to && but when data is a data.frame, the function doesn't produce any output? For example, run: standard(mtcars)
standard <- function(data){
if(class(data) == "data.frame" && ncol(data) > 1){
data[paste0(names(data), ".s")] <- scale(data)
data
}
if(class(data) == "data.frame" && ncol(data) == 1){
colnames(data) <- paste0(names(data), ".s")
data <- scale(data)
data
}
if(class(data) != "data.frame"){
d <- as.data.frame(data)
colnames(d) <- paste0("Var", ncol(d), ".s")
data <- scale(d)
data
}
}
###### EXAMPLES: #######
standard(mtcars[,2]) ##Problem: `Error in if(class(data) == "data.frame" & ncol(data) > 1)`
standard(mtcars["wt"]) ## OK
standard(mtcars) ## after UPDATE, doesn't give any output
am I missing anything here?
& evaluate both elements while && does not
FALSE && stop("boh")
#R> [1] FALSE
TRUE && stop("boh")
#R> Error: boh
FALSE & stop("boh")
#R> Error: boh
See help("Logic")
& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
After your edits
You do not get any results because you do not call return or use if else. See help("function") and help("if"). Here is a small example
f1 <- function(x){
if(x < 0){
x <- -1
x
}
if(x > 0){
x <- 1
x
}
}
f1(-1)
f2 <- function(x){
if(x < 0){
x <- -1
x
}
else if(x > 0){
x <- 1
x
}
}
f2(-1)
#R> [1] -1
f3 <- function(x){
if(x < 0){
x <- -1
return(x)
}
if(x > 0){
x <- 1
return(x)
}
}
f3(-1)
#R> [1] -1
tl;dr you should use && rather than & when doing flow-control, because & always evaluates its second argument, while && short-circuits if the first argument is false. If the argument isn't a data frame (or matrix) then ncol(x) doesn't make sense: see e.g. this question for more information.
Go ahead and unpack it with a simple example.
x <- 1:5
The first part is fine:
class(x) ## "integer"
class(x)=="data.frame" ## TRUE
(although note that you have to be careful, because class(x) might be a vector with more than one element: inherits(x,"data.frame") is safer).
The second part causes the problem:
ncol(x) ## NULL (uh-oh)
ncol(x)>1 ## numeric(0) (uh-oh)
Put them together:
class(x)=="data.frame" & ncol(x)>1 ## logical(0)
What does this do?
if (logical(0)) print("hello")
Error in if (logical(0)) print("hello") : argument is of length zero
Given an arbitrarily nested list, how can I find if a list contains empty lists? Consider the following example:
mylist <- list(list("foo", "bar", "baz", list(list())))
I tried rapply, but that skips through lists. While I could use lapply, I'd need to know the level of nesting beforehand. For this exercise, I don't need to know where the list is (although that would be a bonus), I just need a way to detect if there is one.
What about a function like this
has_empty_list <- function(x) {
if(is.list(x)) {
if (length(x)==0) {
return(TRUE)
} else {
return(any(vapply(x, has_empty_list, logical(1))))
}
} else {
return(FALSE)
}
}
Basically we create a recursive function to look for lists of length 0.
has_empty_list( list(list("foo", "bar", "baz", list(list()))) )
# TRUE
has_empty_list( list(list("foo", "bar", "baz", list(list(4)))) )
# FALSE
And here's a modification to find the index of the empty list
find_empty_list <- function(x, index=c()) {
if(is.list(x)) {
#list
if (length(x)==0) {
if (length(index)==0) {
return(0)
} else {
return(index)
}
} else {
m <- Map(find_empty_list, x, lapply(seq_along(x), function(i) append(index,i)))
# return the most deeply nested
return( m[[which.max(lengths(m))]] )
}
} else {
return(numeric())
}
}
This should return a vector of the index that you can use to find the empty list. For example
( i <- find_empty_list(mylist) )
# [1] 1 4 1
mylist[[i]]
# list()
If the first parameter itself is an empty list, it will return 0
find_empty_list(list())
# 0
and if there is no empty list, it should return an empty vector
find_empty_list(list(1:3, list("c", a~b)))
# numeric()
Another convenient option to work with nested list is to use data.tree package:
library(data.tree)
nodes <- as.Node(mylist)
any(node$Get(function(node) length(as.list(node))) == 0)
# [1] TRUE
Another approach is to use rrapply in the rrapply-package (an extension of base-rrapply):
library(rrapply)
## check if any empty list exists
any(
rrapply(mylist,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x) TRUE,
deflt = FALSE,
how = "unlist"
)
)
#> [1] TRUE
It is straightforward to update the above call to return the index vectors of any empty lists:
## return flat list with position vectors of empty list
rrapply(mylist,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x, .xpos) .xpos,
how = "flatten"
)
#> [[1]]
#> [1] 1 4 1
Here, we make use of the .xpos argument which evaluates to the position of the current list element under evaluation.
Note that this automatically returns all empty list positions instead of only one:
mylist2 <- list(list("foo", list(), "baz", list(list())))
rrapply(mylist2,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x, .xpos) .xpos,
how = "flatten"
)
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 1 4 1
## using MrFlick's find_empty_list function
find_empty_list(mylist2)
#> [1] 1 4 1
I'm trying to do if else statement which includes a condition if three variables in the data frame equal each other.
I was hoping to use the identical function but not sure whether this works for three variables.
I've also used the following but R doesn't seem to like this:
geno$VarMatch <- ifelse((geno[c(1)] != '' & geno[c(2)] != '' & geno[c(3)] != '')
& (geno[c(5)] == geno[c(4)] == geno[c(6)]), 'Not Important', 'Important')
Keeps telling me:
Error: unexpected '=='
Am I supposed to specify something as data.frame/vector etc... Coming from an SPSS stand point, I'm slightly confused.
Sorry for the simplistic query.
I see so complicated results, mine is simple:
all(sapply(list(a,b,c,d), function(x) x == d))
returns TRUE, if all equals d all equals each other.
Here's a recursive function which generalises to any number of inputs and runs identical on them. It returns FALSE if any member of the set of inputs is not identical to the others.
ident <- function(...){
args <- c(...)
if( length( args ) > 2L ){
# recursively call ident()
out <- c( identical( args[1] , args[2] ) , ident(args[-1]))
}else{
out <- identical( args[1] , args[2] )
}
return( all( out ) )
}
ident(1,1,1,1,1)
#[1] TRUE
ident(1,1,1,1,2)
#[1] FALSE
If it's about numeric values, you can put the numbers in an array, then check the array's max and min, as well:
if(max(list) == min(list))
# all numbers in list are equal
else
# at least one element has a different value
You need to use:
geno$VarMatch <- ifelse((gene[c(1)] != '' & gene[c(2)] != '' &
gene[c(3)] != '') &
((gene[c(5)] == gene[c(4)]) &
(gene[c(4)] == gene[c(6)]))),
'Not Important', 'Important')
The == is a binary operator which returns a single logical value. R doesn't expect further input past your first evaluation, unless you feed it a Boolean & for vectors. You may want to modify this, but here's one attempt at a functional programming approach:
testEqual <- function(x, y) ifelse(x == y, x, FALSE)
all(!!Reduce(testEqual, list(1:10, 1:10))) # True
all(!!Reduce(testEqual, rep(T, 3))) # True
all(!!Reduce(testEqual, list(1, 5, 10))) # False
all(!!Reduce(testEqual, list(T, T, F))) # False
The double negation is used to convert values to logical vectors, and the all command returns a single Boolean. This only works for numeric values or logical vectors.
I'm throwing this out here just for fun. I'm not sure if I would actually use this approach, but any critiques are welcomed.
This answer is based on #John's comment under the OP. This is by far the easiest way to go about this.
geno$VarMatch <- ifelse((geno[c(1)] != '' & geno[c(2)] != '' & geno[c(3)] != '')
& (geno[c(5)] == geno[c(4)] & geno[c(5)] == geno[c(6)]), 'Not Important', 'Important')
Simpler than the other answers, and can be used with basic subsetting/ assignment too, e.g.
geno$VarMatch[geno[c(5)] == geno[c(4)] & geno[c(5)] == geno[c(6)]] <– 'Important'
I think you can just come up with simple generic function comparing three elements and then using mutate and rowwise from dplyr apply those to each combination.
library("tidyverse")
set.seed(123)
dta_sample <- tibble(
colA = sample(letters, 10000, TRUE),
colB = sample(letters, 10000, TRUE),
colC = sample(letters, 10000, TRUE)
)
compare_strs <- function(one, two, three) {
if (one == two) {
if (two == three) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(FALSE)
}
}
dta_sample %>%
rowwise() %>%
mutate(all_cols_identical = compare_strs(colA, colB, colC)) %>%
# For results
filter(all_cols_identical)
Preview
# A tibble: 25 x 4
# Rowwise:
colA colB colC all_cols_identical
<chr> <chr> <chr> <lgl>
1 w w w TRUE
2 k k k TRUE
3 m m m TRUE
4 b b b TRUE
5 y y y TRUE
6 n n n TRUE
7 e e e TRUE
8 j j j TRUE
9 q q q TRUE
10 a a a TRUE
# … with 15 more rows