Writing a boolean matrix to a string - r

I have a lower triangular matrix containing TRUE/FALSE values. The matrix is created from a pairwise.t.test and a comparison to the acceptable p-value (p<0.05 => TRUE).
I am trying to output the matrix true values in a string according to a specific formatting without using a mess of if conditions. My thoughts were on matrix products/sums to achieve it, but there may be no elegant solution. If you think it's impossible to do it, I would like to know it aswell so I don't hit my head on the wall forever
The formatting:
If a pair of values (ex:1,2) are TRUE, we output it as "1≠2".
If a value is TRUE with multiple values (ex: 1 with 2,3), we output it as "1≠2,3".
If a value is TRUE with everyone(ex:1 with 2,3,4) we use the word "all" => output is "1≠all"
If 2 pairs (ex:1,2 and 3,4) are TRUE, we separate them with a space. output is "1≠2 3≠4"
If everything is TRUE, we output "all≠"
As of now, I am doing it manually so I don't really have any code to show. I am open to any ideas :)
Examples:
1 2 3
2 TRUE NA NA
3 TRUE TRUE NA
4 TRUE TRUE FALSE
The string for this matrix would be "1,2≠all" because 1 and 2 are true with everyone.
1 2 3
2 FALSE NA NA
3 TRUE TRUE NA
4 TRUE TRUE FALSE
The string for this matrix would be "1,2≠3,4 because 1 is true with 3,4 and 2 is true with 3,4.
Test matrices:
mTest = matrix(c(T,T,T,NA,F,T,NA,NA,F),nrow=3,ncol=3) # "1≠all 2≠3"
row.names(mTest) <- c(2,3,4) ; colnames(mTest) <- c(1,2,3)
mTest[] = c(T,F,T,NA,F,T,NA,NA,F) # "1≠2 1,2≠4"
mTest[] = c(T,T,T,NA,T,F,NA,NA,T) # "1,3≠all"

Related

How to check if 2 vectors are the same in which NA is treated as a normal value? [duplicate]

Here is a vector
a <- c(TRUE, FALSE, FALSE, NA, FALSE, TRUE, NA, FALSE, TRUE)
I'd like a simple function that returns TRUE everytime there is a TRUE in "a", and FALSE everytime there is a FALSE or a NA in "a".
The three following things do not work
a == TRUE
identical(TRUE, a)
isTRUE(a)
Here is a solution
a[-which(is.na(a))]
but it doesn't seem to be a straightforward and easy solution
Is there another solution ?
Here are some functions (and operators) I know:
identical()
isTRUE()
is.na()
na.rm()
&
|
!
What are the other functions (operators, tips, whatever,...) that are
useful to deal with TRUE, FALSE, NA, NaN?
What are the differences between NA and NaN?
Are there other "logical things" than TRUE, FALSE, NA and NaN?
Thanks a lot !
You don't need to wrap anything in a function - the following works
a = c(T,F,NA)
a %in% TRUE
[1] TRUE FALSE FALSE
To answer your questions in order:
1) The == operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!
So you want TRUE to remain TRUE and FALSE to remain FALSE, the only real change is that NA needs to become FALSE, so just do this change like:
a[ is.na(a) ] <- FALSE
Or you could rephrase to say it is only TRUE if it is TRUE and not missing:
a <- a & !is.na(a)
Taking Ben Bolker's suggestion above you could set your own function following the is.na() syntax
is.true <- function(x) {
!is.na(x) & x
}
a = c(T,F,F,NA,F,T,NA,F,T)
is.true(a)
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
This also works for subsetting data.
b = c(1:9)
df <- as.data.frame(cbind(a,b))
df[is.true(df$a),]
a b
1 1 1
6 1 6
9 1 9
And helps avoid accidentally incorporating empty rows where NA do exist in the data.
df[df$a == TRUE,]
a b
1 1 1
NA NA NA
6 1 6
NA.1 NA NA
9 1 9
I like the is.element-function:
is.element(a, T)

R Programming : Logical dataframe to actual dataframe

I need to convert or manipulate the records based on the logical dataframe in R.
I want to match with original dataframe and populate only those values equal to true from original dataframe and null for false value and also maintain the dataframe structure as well. Please suggest
For eg :
Original dataframe
ID Name Title
1 John Mr
2 Mike Mr
3 Susan Dr
Logical Dataframe
ID Name Title
False False False
False True False
False False True
Expected Dataframe
ID Name Title
2 Mike <null>
3 <null> Dr
Here's a shot:
orig <- read.table(text="ID Name Title
1 John Mr
2 Mike Mr
3 Susan Dr", header = TRUE, stringsAsFactors = FALSE)
lgl <- read.table(text="ID Name Title
False False False
False True False
False False True", header = TRUE, stringsAsFactors = FALSE)
newdf <- mapply(function(d,l) { d[!l] <- NA; d; }, orig, lgl)
newdf
# ID Name Title
# [1,] NA NA NA
# [2,] NA "Mike" NA
# [3,] NA NA "Dr"
newdf[ rowSums(!is.na(newdf)) > 0, ]
# ID Name Title
# [1,] NA "Mike" NA
# [2,] NA NA "Dr"
Your expected output is inconsistent in that you have FALSE in your $ID column, but you keep them in your output. You can fix that by changing those to TRUE and changing the filter to rowSums(!is.na(newdf)) > 1.
Explanation:
mapply runs a function (named or anonymous) on one or more lists, like a "zipper" function. That is:
mapply(func, 1:3, 4:6, 7:9, SIMPLIFY=FALSE)
is equivalent to
list(func(1,4,7), func(2,5,8), func(3,6,9))
!is.na(newdf) creates a data.frame of the same dimensions/names, but all elements are logical.
since in general sum(<logical_vector>) returns a single integer of how many elements are true, rowSums(...) returns a vector, one element per row, where each element is the number of "trues" on that row.
... > 0 returns a logical vector, only passing the rows that have at least one non-NA element.
You said you wanted to always preserve $ID. In that case, you probably want to do (before process):
lgl$ID <- TRUE
and change the condition to ... > 1 to me "at least two non-NA elements, one of which we know is ID".

reporting identical values across columns in matrix

I have a matrix that I am performing a for loop over. I want to know if the values of position i in the for loop exist anywhere else in the matrix, and if so, report TRUE. The matrix looks like this
dim
x y
[1,] 5 1
[2,] 2 2
[3,] 5 1
[4,] 5 9
In this case, dim[1,] is the same as dim[3,] and should therefore report TRUE if I am in position i=1 in the for loop. I could write another for loop to deal with this, but I am sure there are more clever and possibly vectorized ways to do this.
We can use duplicated
duplicated(m1)|duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE TRUE FALSE
The duplicated(m1) gives a logical vector of 'TRUE/FALSE' values. If there is a duplicate row, it will be TRUE
duplicated(m1)
#[1] FALSE FALSE TRUE FALSE
In this case, the third row is duplicate of first row. Suppose if we need both the first and third row, we can do the duplication from the reverse side and use | to make both positions TRUE. i.e.
duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE FALSE FALSE
duplicated(m1)|duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE TRUE FALSE
According to ?duplicated, the input data can be
x: a vector or a data frame or an array or ‘NULL’.
data
m1 <- cbind(x=c(5,2,5,5), y=c(1,2,1,9))

Understanding NA and NaN in R [duplicate]

Here is a vector
a <- c(TRUE, FALSE, FALSE, NA, FALSE, TRUE, NA, FALSE, TRUE)
I'd like a simple function that returns TRUE everytime there is a TRUE in "a", and FALSE everytime there is a FALSE or a NA in "a".
The three following things do not work
a == TRUE
identical(TRUE, a)
isTRUE(a)
Here is a solution
a[-which(is.na(a))]
but it doesn't seem to be a straightforward and easy solution
Is there another solution ?
Here are some functions (and operators) I know:
identical()
isTRUE()
is.na()
na.rm()
&
|
!
What are the other functions (operators, tips, whatever,...) that are
useful to deal with TRUE, FALSE, NA, NaN?
What are the differences between NA and NaN?
Are there other "logical things" than TRUE, FALSE, NA and NaN?
Thanks a lot !
You don't need to wrap anything in a function - the following works
a = c(T,F,NA)
a %in% TRUE
[1] TRUE FALSE FALSE
To answer your questions in order:
1) The == operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!
So you want TRUE to remain TRUE and FALSE to remain FALSE, the only real change is that NA needs to become FALSE, so just do this change like:
a[ is.na(a) ] <- FALSE
Or you could rephrase to say it is only TRUE if it is TRUE and not missing:
a <- a & !is.na(a)
Taking Ben Bolker's suggestion above you could set your own function following the is.na() syntax
is.true <- function(x) {
!is.na(x) & x
}
a = c(T,F,F,NA,F,T,NA,F,T)
is.true(a)
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
This also works for subsetting data.
b = c(1:9)
df <- as.data.frame(cbind(a,b))
df[is.true(df$a),]
a b
1 1 1
6 1 6
9 1 9
And helps avoid accidentally incorporating empty rows where NA do exist in the data.
df[df$a == TRUE,]
a b
1 1 1
NA NA NA
6 1 6
NA.1 NA NA
9 1 9
I like the is.element-function:
is.element(a, T)

Measure the max value of all previous values in a data frame

I am trying to make a function that will determine if value in a column of a data frame is a new high. So for example if I had the following data:
x <- rnorm(10,100,sd=5)
x <- data.frame(x)
How can I return, TRUE or FALSE in a new column that only takes into account all the previous values. The resulting table would look something like:
x new.max
1 102.42810 NA
2 109.22762 TRUE
3 101.97970 FALSE
4 101.49303 FALSE
5 93.30595 FALSE
6 96.77199 FALSE
7 110.96441 TRUE
8 96.27485 FALSE
9 101.77163 FALSE
10 100.78992 FALSE
If I try
x$new.max <- ifelse ( x$x == max(x$x) , TRUE, FALSE )
The resulting table is below, as it calculates the maximum value of the entire column instead of a subset of all the previous values.
x new.max
1 102.42810 FALSE
2 109.22762 FALSE
3 101.97970 FALSE
4 101.49303 FALSE
5 93.30595 FALSE
6 96.77199 FALSE
7 110.96441 TRUE
8 96.27485 FALSE
9 101.77163 FALSE
10 100.78992 FALSE
There is a built-in function that computes the running maximum, called cummax().
diff(cummax(x)) will be non-zero at positions where a new maximum is achieved (there's no entry for the first element of x, which is always a new maximum).
Putting the pieces together:
new.max <- c(TRUE, diff(cummax(x)) > 0)
I've set the first element to TRUE, but it could just as well be NA.

Resources