Subsetting a logical vector with a logical vector in R - r

(Note: following the suggestions in the comments, I have changed the original title "Comparing the content of two vectors in R?" to "Subsetting a logical vector with a logical vector in R")
I am trying to understand the following R code snippet (by the way, the question originated while I was trying to understand this example.)
I have a vector a defined as:
a = c(FALSE, FALSE)
Then I can define b:
b <- a
I check b's content and everything looks OK:
b
#> [1] FALSE FALSE
Question
Now, what is the following code doing? Is it checking if b is equal to "not" a?
b[!a]
#> [1] FALSE FALSE
But if I try b[a] the result is different:
b[a]
#> logical(0)
I also tried a different example:
a = c(FALSE, TRUE)
b <- a
b
#> [1] FALSE TRUE
Now I try the same operations as above, but I get a different result:
b[!a]
#> [1] FALSE
b[a]
#> [1] TRUE
Created on 2021-03-23 by the reprex package (v0.3.0)

[] is used for subsetting a vector. You can subset a vector using integer index or logical values.
When you are using logical vector to subset a vector, a value in the vector is selected if it is TRUE. In your example you are subsetting a logical vector with a logical vector which might be confusing. Let's take another example :
a <- c(10, 20)
b <- c(TRUE, FALSE)
a[b]
#[1] 10
Since 1st value is TRUE and second is FALSE, the first value is selected.
Now if we invert the values, 20 would be selected because !b returns FALSE TRUE.
a[!b]
#[1] 20
Now implement this same logic in your example -
a = c(FALSE, FALSE)
b <- a
!b returns TRUE TRUE, hence both the values are selected when you do b[!a] and the none of the value is selected when you do b[a].

b[!a] will result in displaying those values of b which are at TRUE positions as evalauted by !a.
!a is actually T, T therefore displays first and second values of b which are F and F
More efficiently please see this
a <- 1:4
b <- c(T, T, F, T)
now a[!b] will display a[c(F, F, T, F)] i.e. only third element of a

Related

Indexing tables of logical vectors with zero counts in R

I have the following:
> v1 <- c(T, F, T, T, F)
> table(v)
v
FALSE TRUE
2 3
To index the 'True' column, I do this:
> `table(v1)[2]`
TRUE
3
However, if a logical vector contains only FALSE values, the table will only have one column and the previos strategy no longer works to retrieve the TRUE column:
> v2 <- c(F, F, F, F, F)
> table(v2)[2]
<NA>
NA
How can one consistently index the TRUE column regardless of if its count is zero? One solution is to do this:
> table(factor(v2, levels= c("FALSE", "TRUE")))[2]
TRUE
0
But this feels like cheating because it treats TRUE and FALSE as characters that become levels of a factor. For non-logical vectors, this behaviour is understandable, because there is no way of knowing what levels exist. (1) Is there a way to force table() to take into consideration the fact that logical vectors only take on two values and always present two columns for them? (2) Am I overthinking this and the last command is an acceptable and robust practice?
Convert to factor with levels specified so that it always have two levels - without a TRUE value, there is no way the table to create the count of TRUE as that information is not present. With factor levels, it gives the TRUE count to be 0
table(factor(v2, levels = c(FALSE, TRUE)))[2]
It is not clear why a logical vector TRUE values needs to be counted with table and then extract based on the TRUE, FALSE names. It can be more easily done with sum as TRUE -> 1 and FALSE -> 0, negating (!) reverses this
> sum(v1)
[1] 3
> sum(!v1)
[1] 2
> sum(v2)
[1] 0
> sum(!v2)
[1] 5
Because the case of logical is so specific for the requirements, I would write a specific function:
logitable <- function(x)
{
x <- as.logical(x)
kNA <- sum(is.na(x))
kT <- sum(x, na.rm=TRUE)
kF <- length(x) - kT - kNA
return (structure(
c(kT, kF, kNA),
names = c("TRUE", "FALSE", "NA")
))
}
Please note that the type of the return object is not of class "table" --- let me know if this is important to you, to return such an object.
Test with:
logitable(c(T,F,T,F,T))
logitable(c(T,T,T,T,T))
logitable(c(F,F,F,F,F))
logitable(c(T,F,T,F,NA))

Checking NA values and difference

What is the difference between sum(is.na(bollywood), T) and sum(is.na(bollywood))?
I have tried both of these but they are giving different output and I'm not sure of the reason.
The T at the end is for na.rm argument. It is better to spell out the TRUE instead of T as it is not possible to assign TRUE as object name while T can be assigned and this can lead to some buggy situations
sum(is.na(bollywood, na.rm = TRUE)
Here, there is no need for na.rm as is.na is returning only TRUE/FALSE as output depending on whether the object element have NA or not. sum will give the sum of all TRUE elements (TRUE - 1, FALSE - 0)
Using a small reproducible example
bollywood <- c('a', 'b', NA, 'd', NA)
is.na(bollywood)
#[1] FALSE FALSE TRUE FALSE TRUE
here, there are two NAs, so sum returns 2
sum(is.na(bollywood))
#[1] 2
Now, we define an object T
T <- 5
sum(is.na(bollywood), T)
#[1] 7
So, here, it adds the 2 with T value of 5
instead, it should be
sum(is.na(bollywood), na.rm = TRUE)
#[1] 2
As noted above, na.rm is not at all needed. If we check the documentation of ?sum, usage is
sum(..., na.rm = FALSE)
i.e. ... can take multiple arguments, so, the is.na(bollywood) would be the first argument, the T (object created) second argument, and so on

check elements of a list

I have a list of 1000 lists of booleans which is the result of a duplicated() check on the original lists of numbers. I need to find which of these lists contains a TRUE result and I need to know the position of the list where it appears in the 1000. i.e. I can then type
my.list[[456]]
[1] FALSE FALSE FALSE TRUE FALSE
And then use this to delete the elements from my list where a TRUE appears
# An example
l <- list(c(TRUE, FALSE), c(FALSE, FALSE), c(FALSE))
# The indices you want
l2 <- lapply(l, which)
# The number of TRUEs for each element of l
l3 <- lengths(l2)
# The initial list, without the elements containing a TRUE
l4 <- l[l3 == 0]

Filtering with logical + NA values in one column

I have the following data frame:
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
unique(df$Logical)
length(df$Logical == TRUE)
I'm trying to figure out, how many TRUE-values do I have in my df$Logical column. But seems I'm missing something and length(df$Logical == TRUE) returns no of records in my logical column.
What I'm doing wrong in this particular case. Desired result should be 2 for TRUE-values in df$Logical column. Many thanks in advance.
We need to specify the string in the lower case as the values were 'true/false' and not exactly TRUE/FALSE. Also, instead of length, sum should be used. The sum gets the number of TRUE elements.
sum(df$Logical == "true")
#[1] 2
If there are NA elements in the column, use na.rm = TRUE
sum(df$Logical=='true', na.rm = TRUE)
#[1] 2
The length of a logical or any other vector would be the same as the original length/number of rows of the dataset.
length(df$Logical == "true")
#[1] 6
because it returns a logical vector of length 6.
df$Logical == "true"
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
To get the counts of both true and false, we can use table
table(df$Logical)
First of all "true" and "false" as you put it into you data frame are not Booleans but simple strings.
Moreover, length(df$Logical == TRUE) will always return 6 in this example, i.e. the number of elements in the column. This is because df$Logical == TRUE returns a sequence of TRUE or FALSE. In your case it will return
FALSE NA FALSE FALSE FALSE FALSE
because the boolean expression is never true. However, the length of this will be 6 as returned by length().
To overcome the problem you might define your data frame like this
df <- data.frame("Logical"=c(TRUE,NA,FALSE,FALSE,NA,FALSE),
"Numeric"=c(1,2,3,4,5,6))
And then you can sum up the number of TRUE
sum(df$Logical == TRUE, na.rm = T)
[1] 2
na.rm = T is important here because otherwise the sum will return NA if one more more elements are NA.
Alternatively, you can work with strings to indicate true or false (and empty strings a NA)
Then you could write
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
sum(df$Logical == "true", na.rm = T)
[1] 2

Index a Particular Numeric Vector From a List of Vectors in R

In R, for the sake of example, I have a list composed of equal-length numeric vectors of form similar to:
list <- list(c(1,2,3),c(1,3,2),c(2,1,3))
[[1]]
[1] 1 2 3
[[2]]
[1] 1 3 2
[[3]]
[1] 2 1 3
...
Every element of the list is unique. I want to get the index number of the element x <- c(2,1,3), or any other particular numeric vector within the list.
I've attempted using match(x,list), which gives a vector full of NA, and which(list==(c(1,2,3)), which gives me a "(list) object cannot be coerced to type 'double'" error. Coercing the list to different types didn't seem to make a difference for the which function. I also attempted various grep* functions, but these don't return exact numeric vector matches. Using find(c(1,2,3),list) or even some fancy sapply which %in% type functions didn't give me what I was looking for. I feel like I have a type problem. Any suggestions?
--Update--
Summary of Solutions
Thanks for your replies. The method in the comment for this question is clean and works well (via akrun).
> which(paste(list)==deparse(x))
[1] 25
The next method didn't work correctly
> which(duplicated(c(x, list(y), fromLast = TRUE)))
[1] 49
> y
[1] 1 2 3
This sounds good, but in the next block you can see the problem
> y<-c(1,3,2)
> which(duplicated(c(list, list(y), fromLast = TRUE)))
[1] 49
More fundamentally, there are only 48 elements in the list I was using.
The last method works well (via BondedDust), and I would guess it is more efficient using an apply function:
> which( sapply(list, identical, y ))
[1] 25
match works fine if you pass it the right data.
L <- list(c(1,2,3),c(1,3,2),c(2,1,3))
match(list(c(2,1,3)), L)
#[1] 3
Beware that this works via coercing lists to character, so fringe cases will fail - with a hat-tip to #nicola:
match(list(1:3),L)
#[1] NA
even though:
1:3 == c(1,2,3)
#[1] TRUE TRUE TRUE
Although arguably:
identical(1:3,c(1,2,3))
#[1] FALSE
identical(1:3,c(1L,2L,3L))
#[1] TRUE
You can use duplicated(). If we add the matching vector to the end of the original list and set fromLast = TRUE, we will find the duplicate(s). Then we can use which() to get the index.
which(duplicated(c(list, list(c(2, 1, 3)), fromLast = TRUE))
# [1] 3
Or you could add it as the first element and subtract 1 from the result.
which(duplicated(c(list(c(2, 1, 3)), list))) - 1L
# [1] 3
Note that the type always matters with this type of comparison. When comparing integers and numerics, you will need to convert doubles to integers for this to run without issue. For example, 1:3 is not the same type as c(1, 2, 3).
> L <- list(c(1,2,3),c(1,3,2),c(2,1,3))
> sapply(L, identical, c(2,1,3))
[1] FALSE FALSE TRUE
> which( sapply(L, identical, c(2,1,3)) )
[1] 3
This would be slightly less restrictive in its test:
> which( sapply(L, function(x,y){all(x==y)}, c(1:3)) )
[1] 1
Try:
vapply(list,function(z) all(z==x),TRUE)
#[1] FALSE FALSE TRUE
Enclosing the above line to which gives you the index of the list.

Resources