Understanding NA and NaN in R [duplicate] - r

Here is a vector
a <- c(TRUE, FALSE, FALSE, NA, FALSE, TRUE, NA, FALSE, TRUE)
I'd like a simple function that returns TRUE everytime there is a TRUE in "a", and FALSE everytime there is a FALSE or a NA in "a".
The three following things do not work
a == TRUE
identical(TRUE, a)
isTRUE(a)
Here is a solution
a[-which(is.na(a))]
but it doesn't seem to be a straightforward and easy solution
Is there another solution ?
Here are some functions (and operators) I know:
identical()
isTRUE()
is.na()
na.rm()
&
|
!
What are the other functions (operators, tips, whatever,...) that are
useful to deal with TRUE, FALSE, NA, NaN?
What are the differences between NA and NaN?
Are there other "logical things" than TRUE, FALSE, NA and NaN?
Thanks a lot !

You don't need to wrap anything in a function - the following works
a = c(T,F,NA)
a %in% TRUE
[1] TRUE FALSE FALSE

To answer your questions in order:
1) The == operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!

So you want TRUE to remain TRUE and FALSE to remain FALSE, the only real change is that NA needs to become FALSE, so just do this change like:
a[ is.na(a) ] <- FALSE
Or you could rephrase to say it is only TRUE if it is TRUE and not missing:
a <- a & !is.na(a)

Taking Ben Bolker's suggestion above you could set your own function following the is.na() syntax
is.true <- function(x) {
!is.na(x) & x
}
a = c(T,F,F,NA,F,T,NA,F,T)
is.true(a)
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
This also works for subsetting data.
b = c(1:9)
df <- as.data.frame(cbind(a,b))
df[is.true(df$a),]
a b
1 1 1
6 1 6
9 1 9
And helps avoid accidentally incorporating empty rows where NA do exist in the data.
df[df$a == TRUE,]
a b
1 1 1
NA NA NA
6 1 6
NA.1 NA NA
9 1 9

I like the is.element-function:
is.element(a, T)

Related

Indexing tables of logical vectors with zero counts in R

I have the following:
> v1 <- c(T, F, T, T, F)
> table(v)
v
FALSE TRUE
2 3
To index the 'True' column, I do this:
> `table(v1)[2]`
TRUE
3
However, if a logical vector contains only FALSE values, the table will only have one column and the previos strategy no longer works to retrieve the TRUE column:
> v2 <- c(F, F, F, F, F)
> table(v2)[2]
<NA>
NA
How can one consistently index the TRUE column regardless of if its count is zero? One solution is to do this:
> table(factor(v2, levels= c("FALSE", "TRUE")))[2]
TRUE
0
But this feels like cheating because it treats TRUE and FALSE as characters that become levels of a factor. For non-logical vectors, this behaviour is understandable, because there is no way of knowing what levels exist. (1) Is there a way to force table() to take into consideration the fact that logical vectors only take on two values and always present two columns for them? (2) Am I overthinking this and the last command is an acceptable and robust practice?
Convert to factor with levels specified so that it always have two levels - without a TRUE value, there is no way the table to create the count of TRUE as that information is not present. With factor levels, it gives the TRUE count to be 0
table(factor(v2, levels = c(FALSE, TRUE)))[2]
It is not clear why a logical vector TRUE values needs to be counted with table and then extract based on the TRUE, FALSE names. It can be more easily done with sum as TRUE -> 1 and FALSE -> 0, negating (!) reverses this
> sum(v1)
[1] 3
> sum(!v1)
[1] 2
> sum(v2)
[1] 0
> sum(!v2)
[1] 5
Because the case of logical is so specific for the requirements, I would write a specific function:
logitable <- function(x)
{
x <- as.logical(x)
kNA <- sum(is.na(x))
kT <- sum(x, na.rm=TRUE)
kF <- length(x) - kT - kNA
return (structure(
c(kT, kF, kNA),
names = c("TRUE", "FALSE", "NA")
))
}
Please note that the type of the return object is not of class "table" --- let me know if this is important to you, to return such an object.
Test with:
logitable(c(T,F,T,F,T))
logitable(c(T,T,T,T,T))
logitable(c(F,F,F,F,F))
logitable(c(T,F,T,F,NA))

How to check if 2 vectors are the same in which NA is treated as a normal value? [duplicate]

Here is a vector
a <- c(TRUE, FALSE, FALSE, NA, FALSE, TRUE, NA, FALSE, TRUE)
I'd like a simple function that returns TRUE everytime there is a TRUE in "a", and FALSE everytime there is a FALSE or a NA in "a".
The three following things do not work
a == TRUE
identical(TRUE, a)
isTRUE(a)
Here is a solution
a[-which(is.na(a))]
but it doesn't seem to be a straightforward and easy solution
Is there another solution ?
Here are some functions (and operators) I know:
identical()
isTRUE()
is.na()
na.rm()
&
|
!
What are the other functions (operators, tips, whatever,...) that are
useful to deal with TRUE, FALSE, NA, NaN?
What are the differences between NA and NaN?
Are there other "logical things" than TRUE, FALSE, NA and NaN?
Thanks a lot !
You don't need to wrap anything in a function - the following works
a = c(T,F,NA)
a %in% TRUE
[1] TRUE FALSE FALSE
To answer your questions in order:
1) The == operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!
So you want TRUE to remain TRUE and FALSE to remain FALSE, the only real change is that NA needs to become FALSE, so just do this change like:
a[ is.na(a) ] <- FALSE
Or you could rephrase to say it is only TRUE if it is TRUE and not missing:
a <- a & !is.na(a)
Taking Ben Bolker's suggestion above you could set your own function following the is.na() syntax
is.true <- function(x) {
!is.na(x) & x
}
a = c(T,F,F,NA,F,T,NA,F,T)
is.true(a)
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
This also works for subsetting data.
b = c(1:9)
df <- as.data.frame(cbind(a,b))
df[is.true(df$a),]
a b
1 1 1
6 1 6
9 1 9
And helps avoid accidentally incorporating empty rows where NA do exist in the data.
df[df$a == TRUE,]
a b
1 1 1
NA NA NA
6 1 6
NA.1 NA NA
9 1 9
I like the is.element-function:
is.element(a, T)

Filtering with logical + NA values in one column

I have the following data frame:
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
unique(df$Logical)
length(df$Logical == TRUE)
I'm trying to figure out, how many TRUE-values do I have in my df$Logical column. But seems I'm missing something and length(df$Logical == TRUE) returns no of records in my logical column.
What I'm doing wrong in this particular case. Desired result should be 2 for TRUE-values in df$Logical column. Many thanks in advance.
We need to specify the string in the lower case as the values were 'true/false' and not exactly TRUE/FALSE. Also, instead of length, sum should be used. The sum gets the number of TRUE elements.
sum(df$Logical == "true")
#[1] 2
If there are NA elements in the column, use na.rm = TRUE
sum(df$Logical=='true', na.rm = TRUE)
#[1] 2
The length of a logical or any other vector would be the same as the original length/number of rows of the dataset.
length(df$Logical == "true")
#[1] 6
because it returns a logical vector of length 6.
df$Logical == "true"
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
To get the counts of both true and false, we can use table
table(df$Logical)
First of all "true" and "false" as you put it into you data frame are not Booleans but simple strings.
Moreover, length(df$Logical == TRUE) will always return 6 in this example, i.e. the number of elements in the column. This is because df$Logical == TRUE returns a sequence of TRUE or FALSE. In your case it will return
FALSE NA FALSE FALSE FALSE FALSE
because the boolean expression is never true. However, the length of this will be 6 as returned by length().
To overcome the problem you might define your data frame like this
df <- data.frame("Logical"=c(TRUE,NA,FALSE,FALSE,NA,FALSE),
"Numeric"=c(1,2,3,4,5,6))
And then you can sum up the number of TRUE
sum(df$Logical == TRUE, na.rm = T)
[1] 2
na.rm = T is important here because otherwise the sum will return NA if one more more elements are NA.
Alternatively, you can work with strings to indicate true or false (and empty strings a NA)
Then you could write
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
sum(df$Logical == "true", na.rm = T)
[1] 2

Writing a boolean matrix to a string

I have a lower triangular matrix containing TRUE/FALSE values. The matrix is created from a pairwise.t.test and a comparison to the acceptable p-value (p<0.05 => TRUE).
I am trying to output the matrix true values in a string according to a specific formatting without using a mess of if conditions. My thoughts were on matrix products/sums to achieve it, but there may be no elegant solution. If you think it's impossible to do it, I would like to know it aswell so I don't hit my head on the wall forever
The formatting:
If a pair of values (ex:1,2) are TRUE, we output it as "1≠2".
If a value is TRUE with multiple values (ex: 1 with 2,3), we output it as "1≠2,3".
If a value is TRUE with everyone(ex:1 with 2,3,4) we use the word "all" => output is "1≠all"
If 2 pairs (ex:1,2 and 3,4) are TRUE, we separate them with a space. output is "1≠2 3≠4"
If everything is TRUE, we output "all≠"
As of now, I am doing it manually so I don't really have any code to show. I am open to any ideas :)
Examples:
1 2 3
2 TRUE NA NA
3 TRUE TRUE NA
4 TRUE TRUE FALSE
The string for this matrix would be "1,2≠all" because 1 and 2 are true with everyone.
1 2 3
2 FALSE NA NA
3 TRUE TRUE NA
4 TRUE TRUE FALSE
The string for this matrix would be "1,2≠3,4 because 1 is true with 3,4 and 2 is true with 3,4.
Test matrices:
mTest = matrix(c(T,T,T,NA,F,T,NA,NA,F),nrow=3,ncol=3) # "1≠all 2≠3"
row.names(mTest) <- c(2,3,4) ; colnames(mTest) <- c(1,2,3)
mTest[] = c(T,F,T,NA,F,T,NA,NA,F) # "1≠2 1,2≠4"
mTest[] = c(T,T,T,NA,T,F,NA,NA,T) # "1,3≠all"

Dealing with TRUE, FALSE, NA and NaN

Here is a vector
a <- c(TRUE, FALSE, FALSE, NA, FALSE, TRUE, NA, FALSE, TRUE)
I'd like a simple function that returns TRUE everytime there is a TRUE in "a", and FALSE everytime there is a FALSE or a NA in "a".
The three following things do not work
a == TRUE
identical(TRUE, a)
isTRUE(a)
Here is a solution
a[-which(is.na(a))]
but it doesn't seem to be a straightforward and easy solution
Is there another solution ?
Here are some functions (and operators) I know:
identical()
isTRUE()
is.na()
na.rm()
&
|
!
What are the other functions (operators, tips, whatever,...) that are
useful to deal with TRUE, FALSE, NA, NaN?
What are the differences between NA and NaN?
Are there other "logical things" than TRUE, FALSE, NA and NaN?
Thanks a lot !
You don't need to wrap anything in a function - the following works
a = c(T,F,NA)
a %in% TRUE
[1] TRUE FALSE FALSE
To answer your questions in order:
1) The == operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!
So you want TRUE to remain TRUE and FALSE to remain FALSE, the only real change is that NA needs to become FALSE, so just do this change like:
a[ is.na(a) ] <- FALSE
Or you could rephrase to say it is only TRUE if it is TRUE and not missing:
a <- a & !is.na(a)
Taking Ben Bolker's suggestion above you could set your own function following the is.na() syntax
is.true <- function(x) {
!is.na(x) & x
}
a = c(T,F,F,NA,F,T,NA,F,T)
is.true(a)
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
This also works for subsetting data.
b = c(1:9)
df <- as.data.frame(cbind(a,b))
df[is.true(df$a),]
a b
1 1 1
6 1 6
9 1 9
And helps avoid accidentally incorporating empty rows where NA do exist in the data.
df[df$a == TRUE,]
a b
1 1 1
NA NA NA
6 1 6
NA.1 NA NA
9 1 9
I like the is.element-function:
is.element(a, T)

Resources