keep that row if it contains any "TRUE" - r

I have a table look like this one
C1 C2 C3 C4 C5....
R1 FALSE FALSE TRUE TRUE
R2 FALSE FALSE NA TRUE
R3 NA NA NA TRUE
R4 NA FALSE FALSE FALSE
R5 NA NA NA NA
.
.
.
I want to keep all rows which contain at least one TRUE. In this table, R1, R2 and R3 need to be kept. Then, I can extract another column(C21)'s value from this same table.
Please give me some advise, thank you!

# Example
x <-
matrix(c(FALSE, FALSE, NA, NA, NA, FALSE, FALSE, NA, FALSE, NA, TRUE, NA, NA, FALSE, TRUE, TRUE, FALSE, NA),
nrow = 5, ncol = 4, dimnames = list(paste0("R", 1:5), paste0("C", 1:4)))
x
# C1 C2 C3 C4
# R1 FALSE FALSE TRUE TRUE
# R2 FALSE FALSE NA FALSE
# R3 NA NA NA NA
# R4 NA FALSE FALSE FALSE
# R5 NA NA TRUE FALSE
# apply the 'any()' function to the rows, this will return true if there is at
# least one TRUE in the row
apply(x, 1, any)
# R1 R2 R3 R4 R5
# TRUE NA NA NA TRUE
# use 'which' to get the row index
which(apply(x, 1, any))
# R1 R5
# 1 5
# subset the matrix
idx <- which(apply(x, 1, any))
x[idx, ]
# C1 C2 C3 C4
# R1 FALSE FALSE TRUE TRUE
# R5 NA NA TRUE FALSE

apply(X = df1, 1, any)
will give you a logical vector, that you can then use accordingly
i.e. df1[which(apply(df1, 1, any)), ]

We can use rowSums on the logical matrix (df1 & !is.na(df1)), check if the sum is greater than 0, use that logical vector to subset the rows.
Subdf <- df1[rowSums(df1 & !is.na(df1)) >0,]
Subdf
# C1 C2 C3 C4
#R1 FALSE FALSE TRUE TRUE
#R2 FALSE FALSE NA TRUE
#R3 NA NA NA TRUE
Or we can use the na.rm=TRUE in rowSums
df1[rowSums(df1, na.rm=TRUE) > 0,]
We can extract the 'C21' column by Subdf$C21 or Subdf[['C21']] (if the initial dataset is data.frame) or Subdf[, 'C21'] for matrix (in the example, I didn't have 21 columns)

Related

Can someone explain this strange behavior of R's logical test results? [duplicate]

I cannot understand the properties of logical (boolean) values TRUE, FALSE and NA when used with logical OR (|) and logical AND (&). Here are some examples:
NA | TRUE
# [1] TRUE
NA | FALSE
# [1] NA
NA & TRUE
# [1] NA
NA & FALSE
# [1] FALSE
Can you explain these outputs?
To quote from ?Logic:
NA is a valid logical object. Where a component of x or y is NA, the
result will be NA if the outcome is ambiguous. In other words NA &
TRUE evaluates to NA, but NA & FALSE evaluates to FALSE. See the
examples below.
The key there is the word "ambiguous". NA represents something that is "unknown". So NA & TRUE could be either true or false, but we don't know. Whereas NA & FALSE will be false no matter what the missing value is.
It's explained in help("|"):
NA is a valid logical object. Where a component of x or y
is NA, the result will be NA if the outcome is ambiguous. In
other words NA & TRUE evaluates to NA, but NA & FALSE
evaluates to FALSE. See the examples below.
From the examples in help("|"):
x <- c(NA, FALSE, TRUE)
names(x) <- as.character(x)
outer(x, x, "&") ## AND table
# <NA> FALSE TRUE
# <NA> NA FALSE NA
# FALSE FALSE FALSE FALSE
# TRUE NA FALSE TRUE
outer(x, x, "|") ## OR table
# <NA> FALSE TRUE
# <NA> NA NA TRUE
# FALSE NA FALSE TRUE
# TRUE TRUE TRUE TRUE

check changes in consecutive elements (i.e columns) by row ignoring NAs in a data frame

I´m trying to check if the elements in a given row are different.
As example,
> s <- as.data.frame(matrix(ncol=10, nrow=3,
+ c(0,NA,NA,1,1,NA,0,NA,NA,NA,NA,NA,0,NA,0,0,NA,NA,0,0,NA,NA,0,0,0,1,1,NA,NA,NA), byrow = TRUE))
> s
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0 NA NA 1 1 NA 0 NA NA NA
2 NA NA 0 NA 0 0 NA NA 0 0
3 NA NA 0 0 0 1 1 NA NA NA
This is my data frame and I would like to see if there are any changes from "0" to "1" in consecutive columns by row ignoring NA values. The output should be for example a vector v <-c("TRUE","FALSE","TRUE") for the given example. Is there any simple solution?
In case you actually care about whether the sequence 0:1 is present, not just whether there's a non-zero difference, you can do
library(zoo)
apply(s, 1, function(x) 'TRUE' %in% rollapply(x[!is.na(x)], 2, all.equal, 0:1,
check.attributes = F))
#[1] TRUE FALSE TRUE
To illustrate what I mean:
s[2,5] <- -1
apply(s, 1, function(x) any(diff(x[!is.na(x)]) != 0)) # Rui
# [1] TRUE TRUE TRUE
apply(s, 1, function(x) 'TRUE' %in% rollapply(x[!is.na(x)], 2, all.equal, 0:1,
check.attributes = F))
# [1] TRUE FALSE TRUE
The following does what you describe.
v <- apply(s, 1, function(x) any(diff(x[!is.na(x)]) != 0))
v
#[1] TRUE FALSE TRUE
Here is a base R method that pastes the values together after dropping missing values, and then uses grepl to search for "01" in the resulting string.
apply(t(s), 2,
function(x) grepl("01", paste(x[!is.na(x)], collapse=""), fixed=TRUE))
[1] TRUE FALSE TRUE
Note that I transposed your dataset because column-wise operations tend to be more efficient, though it probably doesn't matter here.

R Programming - pmax with data frame

I was trying to use pmax function in my program. I had a data frame of numbers and I was trying to compare it with a single number. Output had NA's.
I figured out that data frame does not work with pmax so I changed data frame to matrix. And it worked. I was curious as to why data frame was returning NA. Is it something to do with recycling?
code:-
mat <- matrix(runif(500), nrow = 20, ncol = 5)
df <- as.data.frame(mat)
pmax(mat, .5) # No NA's
pmax(df, .5) # Many NA's
It is one of the problems where the replication of the value on the second argument is not recycled fully, i.e. it depends on the number of columns. This could be the reason
rep(0.5, ncol(df))[df < 0.5]
#[1] 0.5 0.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#[41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Note that for the first 2 values, the 0.5 is correctly being changed as the logical matrix is TRUE for those elements df < 0.5 while it is not the case because 0.5 is replicated only based on the number of columns.
Suppose, we look at pmax, the line
mmm[change] <- each[change]
is problematic. We can check the output by printing the output of 'each' and 'each[change]. If we modify the function to include the print statement
pmax2 <- function (..., na.rm = FALSE)
{
elts <- list(...)
if (length(elts) == 0L)
stop("no arguments")
if (all(vapply(elts, function(x) is.atomic(x) && !is.object(x),
NA))) {
mmm <- .Internal(pmax(na.rm, ...))
mostattributes(mmm) <- attributes(elts[[1L]])
}
else {
mmm <- elts[[1L]]
has.na <- FALSE
as <- methods::as
asL <- function(x) if (isS4(x))
as(x, "logical")
else x
for (each in elts[-1L]) {
l1 <- length(each)
l2 <- length(mmm)
if (l2 && (l2 < l1 || !l1)) {
if (l1%%l2)
warning("an argument will be fractionally recycled")
mmm <- rep(mmm, length.out = l1)
}
else if (l1 && (l1 < l2 || !l2)) {
if (l2%%l1)
warning("an argument will be fractionally recycled")
each <- rep(each, length.out = l2)
}
na.m <- is.na(mmm)
na.e <- is.na(each)
if (has.na || (has.na <- any(na.m) || any(na.e))) {
if (any(na.m <- asL(na.m)))
mmm[na.m] <- each[na.m]
if (any(na.e <- asL(na.e)))
each[na.e] <- mmm[na.e]
}
nS4 <- !isS4(mmm)
if (isS4(change <- mmm < each) && (nS4 || !isS4(each)))
change <- as(change, "logical")
change <- change & !is.na(change)
print(change)
mmm[change] <- each[change]
print(each)
print(each[change])
if (has.na && !na.rm)
mmm[na.m | na.e] <- NA
if (nS4)
mostattributes(mmm) <- attributes(elts[[1L]])
}
}
mmm
}
Now, we check the print output based on applying pmax2 on 'df'
invisible(pmax2(df, 0.5))
# V1 V2 V3 V4 V5
# [1,] TRUE TRUE TRUE TRUE FALSE
# [2,] TRUE FALSE TRUE TRUE TRUE
# [3,] FALSE FALSE TRUE TRUE FALSE
# [4,] FALSE TRUE TRUE TRUE TRUE
# [5,] FALSE TRUE TRUE FALSE TRUE
# [6,] FALSE FALSE TRUE TRUE TRUE
# [7,] TRUE TRUE TRUE FALSE TRUE
# [8,] FALSE FALSE TRUE FALSE FALSE
# [9,] FALSE FALSE TRUE FALSE TRUE
#[10,] TRUE TRUE TRUE TRUE FALSE
#[11,] FALSE TRUE TRUE TRUE TRUE
#[12,] TRUE TRUE FALSE TRUE FALSE
#[13,] FALSE TRUE TRUE TRUE FALSE
#[14,] FALSE TRUE FALSE FALSE TRUE
#[15,] TRUE FALSE FALSE FALSE TRUE
#[16,] FALSE TRUE FALSE TRUE FALSE
#[17,] TRUE FALSE TRUE FALSE FALSE
#[18,] TRUE FALSE TRUE FALSE TRUE
#[19,] FALSE FALSE TRUE TRUE TRUE
#[20,] TRUE FALSE TRUE FALSE TRUE
#[1] 0.5 0.5 0.5 0.5 0.5
# [1] 0.5 0.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#[41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Note that this is exactly the same output we got with the rep mentioned earlier.
However, on a matrix this is not executed because of the if/else statements
invisible(pmax2(mat, 0.5))
nothing is printed
It is better to apply pmax on a matrix when compared with a single element than on a data.frame and that element. Otherwise, we can unlist the data.frame or convert it to matrix
all.equal(c(pmax(mat, .5)), pmax(unlist(df), .5), check.attributes = FALSE)
#[1] TRUE
data
set.seed(24)
mat <- matrix(runif(500), nrow = 20, ncol = 5)
df <- as.data.frame(mat)

Logical operators (AND, OR) with NA, TRUE and FALSE

I cannot understand the properties of logical (boolean) values TRUE, FALSE and NA when used with logical OR (|) and logical AND (&). Here are some examples:
NA | TRUE
# [1] TRUE
NA | FALSE
# [1] NA
NA & TRUE
# [1] NA
NA & FALSE
# [1] FALSE
Can you explain these outputs?
To quote from ?Logic:
NA is a valid logical object. Where a component of x or y is NA, the
result will be NA if the outcome is ambiguous. In other words NA &
TRUE evaluates to NA, but NA & FALSE evaluates to FALSE. See the
examples below.
The key there is the word "ambiguous". NA represents something that is "unknown". So NA & TRUE could be either true or false, but we don't know. Whereas NA & FALSE will be false no matter what the missing value is.
It's explained in help("|"):
NA is a valid logical object. Where a component of x or y
is NA, the result will be NA if the outcome is ambiguous. In
other words NA & TRUE evaluates to NA, but NA & FALSE
evaluates to FALSE. See the examples below.
From the examples in help("|"):
x <- c(NA, FALSE, TRUE)
names(x) <- as.character(x)
outer(x, x, "&") ## AND table
# <NA> FALSE TRUE
# <NA> NA FALSE NA
# FALSE FALSE FALSE FALSE
# TRUE NA FALSE TRUE
outer(x, x, "|") ## OR table
# <NA> FALSE TRUE
# <NA> NA NA TRUE
# FALSE NA FALSE TRUE
# TRUE TRUE TRUE TRUE

R Equality while ignoring NAs

Is there an equivalent of == but with the result that x != NA if x is not NA?
The following does what I want, but it's clunky:
mapply(identical, vec1, vec2)
Just replace "==" with %in%.
Example:
> df <- data.frame(col1= c("a", "b", NA), col2= 1:3)
> df
col1 col2
1 a 1
2 b 2
3 <NA> 3
> df[df$col1=="a", ]
col1 col2
1 a 1
NA <NA> NA
> df[df$col1%in%"a", ]
col1 col2
1 a 1
> "x"==NA
[1] NA
> "x"%in%NA
[1] FALSE
1 == NA returns a logical NA rather than TRUE or FALSE. If you want to call NA FALSE, you could add a second conditional:
set.seed(1)
x <- 1:10
x[4] <- NA
y <- sample(1:10, 10)
x <= y
# [1] TRUE TRUE TRUE NA FALSE TRUE TRUE FALSE TRUE FALSE
x <= y & !is.na(x)
# [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
You could also use a second processing step to convert all the NA values from your equality test to FALSE.
foo <- x <= y
foo[is.na(foo)] <- FALSE
foo
# [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE
Also, for what its worth, NA == NA returns NA as does NA != NA.
The == operator is often used in combination with filtering data.frames.
In that situation, dplyr::filter will retain only rows where your condition evaluates to TRUE, unlike [. That effectively implements == but where 1 == NA evalutes as FALSE.
Example:
> df <- data.frame(col1= c("a", "b", NA), col2= 1:3)
> df
col1 col2
1 a 1
2 b 2
3 <NA> 3
> dplyr::filter(df, col1=="a")
col1 col2
1 a 1
Why not use base R:
df <- data.frame(col1 = c("a", "b", NA), col2 = 1:3, col3 = 11:13)
df
subset(x = df, subset = col1=="a", select = c(col1, col2))
# col1 col2
# 1 a 1
or with arrays:
df <- c("a", "b", NA)
subset(x = df, subset = df == "a")

Resources