Find indices of rows from matrix A in matrix B - r

Let's consider two matrices A and B. A is a subset of B. How to find the index of each row of A in matrix B?
Here is a reproductible example:
set.seed(30)
B <- matrix(rnorm(n =30,mean = 0), ncol=3)
A <- subset(B, B[,1] > 1)
The goal is to find the indices idx which in this case gives row 4 and 5.

Nested apply loops should do it.
apply(A, 1, function(a)
which(apply(B, 1, function(b) all(b==a)))
)
# [1] 4 5
Or alternatively, using colSums
apply(A, 1, function(a)
which(colSums(t(B) == a) == ncol(B)))
# [1] 4 5

Alternatively, you could do this:
transform(A, idx = 1 * duplicated(rbind(A, B))[-seq_len(nrow(A))])
A nice solution without apply, originally by #Arun.

> match(apply(A, 1, paste, collapse="\b"), apply(B, 1, paste, collapse="\b"))
[1] 4 5

This takes a slightly different approach and relies on the fact that a matrix is a vector, it won't work if you have data.frames:
which( B %in% A , arr.ind=TRUE )[1:nrow(A)]
#[1] 4 5
And if you had really big matrices and wanted to be a bit more efficient you could use %in% on a subset like so:
which( B[1:nrow(B)] %in% A[1:nrow(A)] , arr.ind=TRUE )
But I don't expect this would make too much of a difference except in really big matrices.
If you had your data as data.frames you could do the same thing by passing just the first column to which:
A <- data.frame(A)
B <- data.frame(B)
which( B$X1 %in% A$X1 )
#[1] 4 5

Related

Differing number of rows

Suppose I have a vector of numbers a a<-c(1, 2, 3, 4, 5, 6) and a vector of positions b b<-c(1, 2, 3).
Then I want to get the numbers that come before every position b in the vector a.
I do this lapply(b, function(x) a[1:x]) and I get the result
[1] 1
[[2]]
[1] 1 2
[[3]]
[1] 1 2 3
Now I want to combine them in a dataframe normally if the number of values for every position were equal I would have done t(as.data.frame(lapply(b, function(x) a[1:x])) But I cannot do that right now because the number of rows are different. How can I put zeros for the non-existing values?
If the output list is 'lst1', then make the lengths same with length<- assignment
lapply(lst1, function(x) {
length(x) <- max(lengths(lst1))
replace(x, is.na(x), 0)})
data
lst1 <- lapply(b, function(x) a[1:x])

Conditions & Subtraction from Matrix in R

I've looked at R create a vector from conditional operation on matrix, and using a similar solution does not yield what I want (and I'm not sure why).
My goal is to evaluate df with the following condition: if df > 2, df -2, else 0
Take df:
a <- seq(1,5)
b <- seq(0,4)
df <- cbind(a,b) %>% as.data.frame()
df is simply:
a b
1 0
2 1
3 2
4 3
5 4
df_final should look like this after a suitable function:
a b
0 0
0 0
1 0
2 1
3 2
I applied the following function with the result, and I'm not sure why it doesn't work (further explanation of a solution would be appreciated)
apply(df,2,function(df){
ifelse(any(df>2),df-2,0)
})
Yielding the following:
a b
-1 -2
Thank you SO community!
Let's fix your function and understand why it didn't work:
apply(df, # apply to df
2, # to each *column* of df
function(df){ # this function. Call the function argument (each column) df
# (confusing because this is the same name as the data frame...)
ifelse( # Looking at each column...
any(df > 2), # if there are any values > 2
df - 2, # then df - 2
0 # otherwise 0
)
})
any() returns a single value. ifelse() returns something the same shape as the test, so by making your test any(df > 2) (a single value), ifelse() will also return a single value.
Let's fix this by (a) changing the function to be of a different name than the input (for readability) and (b) getting rid of the any:
apply(df, # apply to df
2, # to each *column* of df
function(x){ # this function. Call the function argument (each column) x
ifelse( # Looking at each column...
x > 2, # when x is > 2
df - 2, # make it x - 2
0 # otherwise 0
)
})
apply is made for working on matrices. When you give it a data frame, the first thing it does is convert it to a matrix. If you want the result to be a data frame, you need to convert it back to a data frame.
Or we can use lapply instead. lapply returns a list, and by assigning it to the columns of df with df[] <- lapply(), we won't need to convert. (And since lapply doesn't do the matrix conversion, it knows by default to apply the function to each column.)
df[] <- lapply(df, function(x) ifelse(x > 2, x - 2, 0))
As a side note, df <- cbind(a,b) %>% as.data.frame() is a more complicated way of writing df <- data.frame(a, b)
Create the 'out' dataset by subtracting 2, then replace the values that are based on a logical condition to 0
out <- df - 2
out[out < 0] <- 0
Or in a single step
(df-2) * ((df - 2) > 0)
Using apply
a <- seq(1,5)
b <- seq(0,4)
df <- cbind(a,b) %>% as.data.frame()
new_matrix <- apply(df, MARGIN=2,function(i)ifelse(i >2, i-2,0))
new_matrix
###if you want it to return a tibble/df
new_tibble <- apply(df, MARGIN=2,function(i)ifelse(i >2, i-2,0)) %>% as_tibble()

Removing all subsets from a list

I have a list that looks as follows:
a <- c(1, 3, 4)
b <- c(0, 2, 6)
c <- c(3, 4)
d <- c(0, 2, 6)
list(a, b, c, d)
From this list I would like to remove all subsets such that the list looks as follows:
[[1]]
[1] 1 3 4
[[2]]
[1] 0 2 6
How do I do this? In my actual data I am working with a very long list (> 500k elements) so any suggestions for an efficient implementation are welcome.
Here is an approach.
lst <- list(a, b, c, d) # The list
First, remove all duplicates.
lstu <- unique(lst)
If the list still contains more than one element, we order the list by the lengths of its elements (decreasing).
lstuo <- lstu[order(-lengths(lstu))]
Then subsets can be filtered with this command:
lstuo[c(TRUE, !sapply(2:length(lstuo),
function(x) any(sapply(seq_along(lstuo)[-x],
function(y) all(lstuo[[x]] %in% lstu[[y]])))))]
The result:
[[1]]
[1] 1 3 4
[[2]]
[1] 0 2 6
Alternative approach
Your data
lst <- list(a, b, c, d) # The list
lstu <- unique(lst) # remove duplicates, piggyback Sven's approach
Make matrix of values and index
m <- combn(lstu, 2) # 2-row matrix of non-self pairwise combinations of values
n <- combn(length(lstu), 2) # 2-row matrix of non-self pairwise combination of index
Determine if subset
issubset <- t(sapply(list(c(1,2),c(2,1)), function(z) mapply(function(x,y) all(x %in% y), m[z[1],], m[z[2],])))
Discard subset vectors from list
discard <- c(n*issubset)[c(n*issubset)>0]
ans <- lstu[-discard]
Output
[[1]]
[1] 1 3 4
[[2]]
[1] 0 2 6

subset in R based on factor level n-times given a vector of matching variables

Newbie in R and I've been trying to find a neat (not using a loop) way to do the following:
x <- c(0, 4)
y <- c(1, 2)
df <- data.frame(x,y)
therefore if I want to output all x for which y=1:
df$x[df$y == 1]
but what if I have a vector such as a <- c(1,1,1)?
I can't just do:
df$x[df$y == a]
because it subsets just once:
[1] 0
but I want the output to be the vector c(0,0,0)
Obviously this isn't the way to go about it, but any clues as to which is?
Thanks!
I think what you're after is %in%. Try:
df$x[df$y %in% a]
I think you are looking for %in%:
df$x[df$y %in% a]
%in% returns TRUE for each value in df$y when it is in a.
Proper way to do this is
df[df$y %in% a,]$x
or
df[df$y %in% a,'x']
According to your question, the desired result is the vector c(0,0,0). One way you could achieve that is:
rep(df$x[df$y %in% a], length(a))
#[1] 0 0 0
But you need to be aware of the implications, for example if you change a so that it contains different numbers. Here's another example:
a <- c(3,1,2)
rep(df$x[df$y %in% a], length(a))
#[1] 0 4 0 4 0 4
So in this case, the output has a length of 2*length(a) because two different values of a match an entry in df$y. It is not clear from your question what behavior you want in such a case. So here's a third example if you want each value of a repeated only as often as the number of elements in a that are also present in df$y:
a <- c(3,1,2)
rep(df$x[df$y %in% a], length(a[a %in% df$y]))
#[1] 0 4 0 4

Remove same indices from two vectors

I have two vectors in R, e.g.
a <- c(2,6,4,9,8)
b <- c(8,9,4,2,1)
Vectors a and b are ordered in a way that I wish to conserve (I will be plotting them against each other). I want to remove certain values from vector a and remove the values at the same indices in b. e.g. if I wanted to remove values ≥ 8 from a:
a <- a[a<8]
... which gives a new vector without those values.
Is there now an easy way of removing values from the same indices in b (in this example indices 4 and 5)? Perhaps by using a data frame?
Something like this:
keep <- a < 8
a <- a[keep]
b <- b[keep]
You could also use:
keep <- which( a < 8 )
If the vectors are logically part of the same data, use a data frame. It is better programming practice.
df <- data.frame(a = a, b = b)
df <- df[df$a < 8, ]
Otherwise, set another vector to be the indices removed:
keep <- a < 8
a <- a[keep]
b <- b[keep]
Why not:
d <- data.frame(a=a, b=b)
d <- d[d$a < 8, ]
Or even:
d <- subset(d, a < 8)
First remove the indices from b then from a
b <- b[a<8]
a <- a[a<8]
a<8 returns a vector which defines which indices are smaller than 8.
If this is purely for plotting, you can avoid messing with b and the x-axis by using NA .
a[a>8]<-NA
plot(b,a) #works for point or line graphs

Resources