reporting identical values across columns in matrix - r

I have a matrix that I am performing a for loop over. I want to know if the values of position i in the for loop exist anywhere else in the matrix, and if so, report TRUE. The matrix looks like this
dim
x y
[1,] 5 1
[2,] 2 2
[3,] 5 1
[4,] 5 9
In this case, dim[1,] is the same as dim[3,] and should therefore report TRUE if I am in position i=1 in the for loop. I could write another for loop to deal with this, but I am sure there are more clever and possibly vectorized ways to do this.

We can use duplicated
duplicated(m1)|duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE TRUE FALSE
The duplicated(m1) gives a logical vector of 'TRUE/FALSE' values. If there is a duplicate row, it will be TRUE
duplicated(m1)
#[1] FALSE FALSE TRUE FALSE
In this case, the third row is duplicate of first row. Suppose if we need both the first and third row, we can do the duplication from the reverse side and use | to make both positions TRUE. i.e.
duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE FALSE FALSE
duplicated(m1)|duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE TRUE FALSE
According to ?duplicated, the input data can be
x: a vector or a data frame or an array or ‘NULL’.
data
m1 <- cbind(x=c(5,2,5,5), y=c(1,2,1,9))

Related

How does R use square brackets to return values in a vector?

I came across a question like this: "retrieve all values less than or equal to 5 from a vector of sequence 1 through 9 having a length of 9". Now based on my knowledge so far, I did trial & error, then I finally executed the following code:
vec <- c(1:9) ## assigns to vec
lessThanOrEqualTo5 <- vec[vec <= 5]
lessThanOrEqualTo5
[1] 1 2 3 4 5
I know that the code vec <= 5 would return the following logical
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
So my question is, how does R use these logical to return the appropriate values satisfying the condition since the code would end up having a structure like this vec[TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE]?

How to find occurrences in a set of numbers in a vector

Ive recently been put in a R coding class and I've been having trouble with this problem. Every way i try to approach this problem, my code ends up wrong. Whats the best way to answer this? thank you.
Write an R program that return the occurrences of a set of values in a vector.
For instance, if the vector is [1,2,3,4,4,5,6,5,7,8,9,10,5] and the set of values is [5,4], then the result is 5, because in the vector there are two occurrence of the value 4, and three occurrences of the value 5.
The function accepts as input:
a vector representing the numbers to analyze;
a vector representing the set of number to count.
The function returns:
a number representing the occurrences of the set values in the list of
numbers to analyze.
Here are some hints that can help you create your desired function. This is not the best way, but this way has helped me a lot in understanding what I should do. This way I am talking about is breaking down your problem into examples or smaller problems.
Firstly, let us store your vector in the object x.
x <- c(1,2,3,4,4,5,6,5,7,8,9,10,5)
Now if we type:
x==4
we get:
FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
If we type:
x==5
we get
FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
If we type:
x==4|x==5
which asks if each element of x is 4 or 5, we get:
FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
The remaining thing to do is count the number of TRUE in the last output.
If we type
as.numeric(x==4|x==5)
we get
0 0 0 1 1 1 0 1 0 0 0 0 1
In fact, we can simply type
sum(x==4|x==5)
or
length(x[x==4|x==5])
to get the desired answer of 5. Another way of doing this is with the %in% function. Suppose we have
y <- c(4,5)
Then
sum(x %in% y)
will also give us the desired number of 4 or 5 in x. Now, it remains for you to write out sum(x %in% y) as a function in x and y to use it for arbitrary vector x and arbitrary vector y.

Count by row with variable criteria

I have a data.frame in which I want to perform a count by row versus a specified criterion. The part I cannot figure out is that I want a different count criterion for each row.
Say I have 10 rows, I want 10 different criteria for the 10 rows.
I tried: count.above <- rowSums(Data > rate), where rate is a vector with the 10 criterion, but R used only the first as the criterion for the whole frame.
I imagine I could split my frame into 10 vectors and perform this task, but I thought there would be some simple way to do this without resorting to that.
Edit: this depends whether you want to operate over rows or columns. See below:
This is a job for mapply and Reduce. Suppose you have a data frame along the lines of
df1 <- data.frame(a=1:10,b=2:11,c=3:12)
Let's say we want to count the rows where a>6, b>3 and c>5. This is done with mapply:
mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE)
$a
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
$b
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
$c
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Now we use Reduce to find those which are all TRUE:
Reduce("&",mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
Lastly, we use sum to add them all up:
sum(Reduce("&",mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE)))
[1] 4
If you want a result for each row rather than a global aggregate, then apply is the function to use:
apply(df1,1,function(v) sum(v>c(6,3,5)))
[1] 0 0 1 2 2 2 3 3 3 3
Given the dummy data (from #zx8754s solution)
# dummy data
df1 <- data.frame(matrix(1:15, nrow = 3))
myRate <- c(7, 5, 1)
Solution using apply
Courtesy of #JDL
rowSums(apply(df1, 2, function(v) v > myRate))
Alternative solution using the Reduce pattern
Reduce(function(l, v) cbind(l[,1] + (l[,2] > myRate), l[,-2:-1]),
1:ncol(df1),
cbind(0, df1))

Writing a boolean matrix to a string

I have a lower triangular matrix containing TRUE/FALSE values. The matrix is created from a pairwise.t.test and a comparison to the acceptable p-value (p<0.05 => TRUE).
I am trying to output the matrix true values in a string according to a specific formatting without using a mess of if conditions. My thoughts were on matrix products/sums to achieve it, but there may be no elegant solution. If you think it's impossible to do it, I would like to know it aswell so I don't hit my head on the wall forever
The formatting:
If a pair of values (ex:1,2) are TRUE, we output it as "1≠2".
If a value is TRUE with multiple values (ex: 1 with 2,3), we output it as "1≠2,3".
If a value is TRUE with everyone(ex:1 with 2,3,4) we use the word "all" => output is "1≠all"
If 2 pairs (ex:1,2 and 3,4) are TRUE, we separate them with a space. output is "1≠2 3≠4"
If everything is TRUE, we output "all≠"
As of now, I am doing it manually so I don't really have any code to show. I am open to any ideas :)
Examples:
1 2 3
2 TRUE NA NA
3 TRUE TRUE NA
4 TRUE TRUE FALSE
The string for this matrix would be "1,2≠all" because 1 and 2 are true with everyone.
1 2 3
2 FALSE NA NA
3 TRUE TRUE NA
4 TRUE TRUE FALSE
The string for this matrix would be "1,2≠3,4 because 1 is true with 3,4 and 2 is true with 3,4.
Test matrices:
mTest = matrix(c(T,T,T,NA,F,T,NA,NA,F),nrow=3,ncol=3) # "1≠all 2≠3"
row.names(mTest) <- c(2,3,4) ; colnames(mTest) <- c(1,2,3)
mTest[] = c(T,F,T,NA,F,T,NA,NA,F) # "1≠2 1,2≠4"
mTest[] = c(T,T,T,NA,T,F,NA,NA,T) # "1,3≠all"

Remove rows from dataframe with boolean values

I have the following dataframe a:
> a <- cbind(c(FALSE,FALSE,TRUE,TRUE),c(TRUE,FALSE,FALSE,TRUE))
> a
[,1] [,2]
[1,] FALSE TRUE
[2,] FALSE FALSE
[3,] TRUE FALSE
[4,] TRUE TRUE
I want to remove all rows whose first column value and second column value is false. Note that I do have some other, non-boolean columns.
So you want to keep each row which contains at least one TRUE column:
keep <- a[,1] | a[,2]
a <- a[keep, ]
You can use rowSums.
a[(rowSums(a[,1:2])!=0),]

Resources