Dealing with missing values when writing logical expressions - r

If you are familiar with SAS you know that missing values are considered as -inf, therefore, for an expression like this:
If a < 2 then c=1 ;
else c= 5;
Where "a" is missing; value of 1 will be assigned to c, because the logical expression "a < 2" will be "True". Please notice that my question is about the outcome of the logical expression. I do not care about "c".
I would like to get the same thing in R but the result of the logical expression will be "NA" :
a <-NA
a < 2
[1] NA
How can I change the output of this logical expression to "True" ?
I know I can do this:
output < ifelse( is.na(a), False, a <2)
But I am looking for something simple. Any ideas?

If you use this frequently enough, then you could define an infix operator to wrap around your ifelse:
`%<%` <- function(a, b) { ifelse(is.na(a), TRUE, a < b) }
So if a is
a <- c(NA, 1, 3, 5, NA)
Then you only need do:
a %<% 2
#> [1] TRUE TRUE FALSE FALSE TRUE

You can use the fact that NA is a logical object.
(a < 2) | is.na(a)

Just for fun, and I am absolutely not recommending this approach:
Ops.sas <- function (e1, e2) {
comparison <- switch(.Generic, `<` = , `>` = , `==` = , `!=` = ,
`<=` = , `>=` = TRUE, FALSE)
if (comparison) {
e1[is.na(e1)] <- -Inf
e2[is.na(e2)] <- -Inf
}
NextMethod(.Generic)
}
And now:
> foo <- structure(c(NA, 2,3,2, NA), class = "sas")
> bar <- structure(c(2,3,2, NA, NA), class = "sas")
> foo < bar
[1] TRUE TRUE FALSE FALSE FALSE
> foo <= bar
[1] TRUE TRUE FALSE FALSE TRUE
> foo == bar
[1] FALSE FALSE FALSE FALSE TRUE
> foo != bar
[1] TRUE TRUE TRUE TRUE FALSE
> foo > bar
[1] FALSE FALSE TRUE TRUE FALSE
> foo >= bar
[1] FALSE FALSE TRUE TRUE TRUE

if(is.na(a) | a < 2){
c=1
} else {
c=2
}
I think that the simplest way is just to add a new condition in your test. In R the | symbol stands for OR.

I think you will have to modify the values in a for example like this
a <- c(NA, 1, 3, 4, NA)
a
#> [1] NA 1 3 4 NA
a < 2
#> [1] NA TRUE FALSE FALSE NA
a[is.na(a)] <- -Inf
a < 2
#> [1] TRUE TRUE FALSE FALSE TRUE
Created on 2022-02-04 by the reprex package (v2.0.1)

Related

R: how to check if a vector is found in another vector of different length without using %in%

vector_1 = c(4,3,5,1,2)
vector_2 = c(3,1)
output:
[1] FALSE TRUE FALSE TRUE FALSE
how do I get the output just by using basic operators/loops without using the operator %in% or any functions in R?
See match.fun(`%in%`)
match(vector_1,vector_2, nomatch = 0) > 0
Without "functions" is a bit vague, since virtually anything in R is a function. Probably that's an assignment and a for loop is wanted.
res <- logical(length(vector_1))
for (i in seq_along(vector_1)) {
for (j in seq_along(vector_2)) {
if (vector_1[i] == vector_2[j])
res[i] <- TRUE
}
}
res
# [1] FALSE TRUE FALSE TRUE FALSE
However, that's not very R-ish where you rather want to do something like
apply(outer(vector_1, vector_2, `==`), 1, \(x) x[which.max(x)])
# [1] FALSE TRUE FALSE TRUE FALSE
Data:
vector_1 <- c(4, 3, 5, 1, 2)
vector_2 <- c(3, 1)
One way with sapply() -
sapply(vector_1, function(x) any(x == vector_2))
[1] FALSE TRUE FALSE TRUE FALSE

Problem with a vectorized operation with 2 logical vectors: replace values in first vector to NA according to FALSE values in the other vector

I have a question about a vectorized operation with logical vectors. In my problem, there are two vectors: main and secondary. They're both of the same length. I want to replace some elements in the main vector to NA, based on insights I gather from the secondary vector.
The main vector is comprised of TRUE and FALSE that can appear in any random order.
The secondary vector is either:
a sequence of TRUE then a sequence of FALSE with/without NA as the last element; or
all TRUE; or
all FALSE; or
all FALSE with last element as NA; or
all TRUE with last element as NA
I'll provide several examples below and explain the desired algorithm.
A - The most common case
replace x values with NA for positions that are FALSE in y
# here, `x` is the main vector
x <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, NA)
# `y` is the secondary vector
y <- c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE)
# `z` is the desired output
z <- c(FALSE, FALSE, TRUE, FALSE, FALSE, NA, NA) # in this case only index 7 in `x` actually changed
B - The secondary vector is all FALSE
x2 <- c(FALSE, NA)
y2 <- c(FALSE, FALSE)
# desired output
z2 <- c(NA, NA)
C - The secondary vector is all TRUE
x3 <- rep(FALSE, 4)
y3 <- rep(TRUE, 4)
# desired output
z3 <- rep(FALSE, 4)
My attempt
I've almost figured this out, but unfortunately it fails with scenario B.
my_func <- function(main, secondary) {
idx_last_true_in_secondary <- max(which(secondary))
if(idx_last_true_in_secondary == length(secondary)) {
return(main)
}
main[(idx_last_true_in_secondary + 1): length(main)] <- NA
main
}
# case A
my_func(x, y)
#> [1] FALSE FALSE TRUE FALSE FALSE NA NA
# case B
my_func(x2, y2)
#> Warning in max(which(secondary)): no non-missing arguments to max; returning
#> -Inf
#> Error in (idx_last_true_in_secondary + 1):length(main): result would be too long a vector
# case C
my_func(x3, y3)
#> [1] FALSE FALSE FALSE FALSE
My question is whether anyone sees a better way to approach the problem?
This seems to work as expected:
my_func <- function(main,secondary ) {
main[!secondary] <- NA
return(main)
}
my_func(x,y)
[1] FALSE FALSE TRUE FALSE FALSE NA NA
my_func(x2,y2)
[1] NA NA
my_func(x3,y3)
[1] FALSE FALSE FALSE FALSE
We could do it this way:
my_func <- function(x, y) {
replace(x, !y, NA)
}
How it works:
# A
replace(x, !y, NA)
[1] FALSE FALSE TRUE FALSE FALSE NA NA
# B
replace(x2, !y2, NA)
[1] NA NA
# C
replace(x3, !y3, NA)
[1] FALSE FALSE FALSE FALSE
my_func(x,y)
my_func(x2,y2)
my_func(x3,y3)
output:
> my_func(x,y)
[1] FALSE FALSE TRUE FALSE FALSE NA NA
> my_func(x2,y2)
[1] NA NA
> my_func(x3,y3)
[1] FALSE FALSE FALSE FALSE
We can try ifelse like below
> ifelse(y,x,NA)
[1] FALSE FALSE TRUE FALSE FALSE NA NA

How does the all function work in R using two expressions?

I have an issue with the all function in R.
let a and b two vectors:
a <- c(Inf,0)
b <- c(1,0)
When I try to evaluate the expression all(a==b) the function returns FALSE, is OK, if it is evaluated the expression all(a==Inf) the function returns FALSE, so far all is working OK, but if I try to evaluate the expression all((a==b) | (a==Inf)) the function returns TRUE.
Could someone explain me why?
The OR is done column wise:
a <- c(Inf,0)
b <- c(1,0)
(a==b)
#> [1] FALSE TRUE
(a==Inf)
#> [1] TRUE FALSE
(a==Inf)|(a==b)
#> [1] TRUE TRUE
In each column there's a TRUE so each column is TRUE
When you type help("|"), you will see that | is element-wise OR.
In this case, given
> (a == b)
[1] FALSE TRUE
> (a == Inf)
[1] TRUE FALSE
the expression (a == b) | (a == Inf) is equivalent to
c(FALSE, TRUE) | c(TRUE, FALSE)
and the resultant logic array is c(TRUE, TRUE), which gives you TRUE when you apply all over it.

Function to find if a value is greater than all prior values in a vector

This should be very simple, but my r knowledge is limited.
I'm trying to find out if any value is greater than all previous values.
An example would be
x<-c(1.1, 2.5, 2.4, 3.6, 3.2)
results:
NA True False True False
My real values are measurements with many decimal places so I doubt I will get the same value twice
You can use cummax() to get the biggest value so far. x >= cummax(x) basically gives you the answer, although element 1 is TRUE, so you just need to change that:
> out = x >= cummax(x)
> out[1] = NA
> out
[1] NA TRUE FALSE TRUE FALSE
Although #Marius has got this absolutely correct. Here is an option with a loop
sapply(seq_along(x), function(i) all(x[i] >= x[seq_len(i)]))
#[1] TRUE TRUE FALSE TRUE FALSE
Or same logic with explicit for loop
out <- logical(length(x))
for(i in seq_along(x)) {
out[i] <- all(x[i] >= x[seq_len(i)])
}
out[1] <- NA
out
#[1] NA TRUE FALSE TRUE FALSE
We can use lapply
unlist(lapply(seq_along(x), function(i) all(x[i] >=x[seq(i)])))
#[1] TRUE TRUE FALSE TRUE FALSE
Or with max.col
max.col(t(sapply(x, `>=`, x)), 'last') > seq_along(x)
#[1] FALSE TRUE FALSE TRUE FALSE
or with for loop
mx <- x[1]
i1 <- logical(length(x))
for(i in seq_along(x)) {i1[i][x[i] > mx] <- TRUE; mx <- max(c(mx, x[i]))}

which rows match a given vector in R

I have a matrix A,
A = as.matrix(data.frame(col1 = c(1,1,2,3,1,2), col2 = c(-1,-1,-2,-3,-1,-2), col3 = c(2,6,1,3,2,4)))
And I have a vector v,
v = c(-1, 2)
How can I get a vector of TRUE/FALSE that compares the last two columns of the matrix and returns TRUE if the last two columns match the vector, or false if they don't?
I.e., If I try,
A[,c(2:3)] == v
I obtain,
col2 col3
[1,] TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE
[5,] TRUE FALSE
[6,] FALSE FALSE
Which is not what I want, I want both columns to be the same as vector v, more like,
result = c(TRUE, FALSE, FALSE, FALSE, TRUE, FALSE)
Since the first, and 5th rows match the vector v entirely.
Here's a simple alternative
> apply(A[, 2:3], 1, function(x) all(x==v))
[1] TRUE FALSE FALSE FALSE TRUE FALSE
Ooops by looking into R mailing list I found an answer: https://stat.ethz.ch/pipermail/r-help/2010-September/254096.html,
check.equal <- function(x, y)
{
isTRUE(all.equal(y, x, check.attributes=FALSE))
}
result = apply(A[,c(2:3)], 1, check.equal, y=v)
Not sure I need to define a function and do all that, maybe there are easier ways to do it.
Here's another straightforward option:
which(duplicated(rbind(A[, 2:3], v), fromLast=TRUE))
# [1] 1 5
results <- rep(FALSE, nrow(A))
results[which(duplicated(rbind(A[, 2:3], v), fromLast=TRUE))] <- TRUE
results
# [1] TRUE FALSE FALSE FALSE TRUE FALSE
Alternatively, as one line:
duplicated(rbind(A[, 2:3], v), fromLast=TRUE)[-(nrow(A)+1)]
# [1] TRUE FALSE FALSE FALSE TRUE FALSE
A dirty one:
result <- c()
for(n in 1:nrow(A)){result[n] <-(sum(A[n,-1]==v)==2)}
> result
[1] TRUE FALSE FALSE FALSE TRUE FALSE

Resources