Comparing rows of matrix and replacing matching elements - r

I want to compare two matrices. If row elements in the first matrix matches row elements in the second matrix, then I want the rows in the second matrix to be kept. If the rows do not match, then I want those rows to be to empty. I apologise that I had a quite similar question recently, but I still haven't been able to solve this one.
INPUT:
> mat1<-cbind(letters[3:8])
> mat1
[,1]
[1,] "c"
[2,] "d"
[3,] "e"
[4,] "f"
[5,] "g"
[6,] "h"
> mat2<-cbind(letters[1:5],1:5)
> mat2
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
Expected OUTPUT:
> mat3
[,1] [,2]
[1,] "NA" "NA"
[2,] "NA" "NA"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
I have unsuccessfully attempted this:
> mat3<-mat2[ifelse(mat2[,1] %in% mat1[,1],mat2,""),]
Error in mat2[ifelse(mat2[, 1] %in% mat1[, 1], mat2, ""), ] :
no 'dimnames' attribute for array
I have been struggling for hours, so any suggestions are welcomed.

You were on the right track, but the answer is a little simpler than what you were trying. mat2[, 1] %in% mat1[, 1] returns the matches as a logical vector, and we can just set the non-matches to NA using that vector as an index.
mat1<-cbind(letters[3:8])
mat2<-cbind(letters[1:5],1:5)
match <- mat2[,1] %in% mat1 # gives a T/F vector of matches
mat3 <- mat2
mat3[!match,] <- NA

Related

Replace values in one matrix with values from another

I am a programming newbie attempting to compare two matrices. In case an element from first column in mat1 matches any element from first column in mat2, then I want that matching element in mat1 to be replaced with the neighboor (same row different column) to the match in mat2.
INPUT:
mat1<-matrix(letters[1:5])
mat2<-cbind(letters[4:8],1:5)
> mat1
[,1]
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "d"
[5,] "e"
> mat2
[,1] [,2]
[1,] "d" "1"
[2,] "e" "2"
[3,] "f" "3"
[4,] "g" "4"
[5,] "h" "5"
wished OUTPUT:
> mat3
[,1]
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "1"
[5,] "2"
I have attempted the following without succeeding:
> for(x in mat1){mat3<-ifelse(x==mat2,mat2[which(x==mat2),2],mat1)}
> mat3
[,1] [,2]
[1,] "a" "a"
[2,] "2" "b"
[3,] "c" "c"
[4,] "d" "d"
[5,] "e" "e"
Any advice will be very appreciated. Have spent a whole day without making it work. It doesn't matter to me if the elements are in a matrix or a data frame.
Thanks.
ifelse is vectorized so, we can use it on the whole column. Create the test logical condition in ifelse by checking whether the first column values of 'mat1' is %in% the first column of 'mat2', then , get the index of the corresponding values with match, extract the values of the second column with that index, or else return the first column of 'mat1'
mat3 <- matrix(ifelse(mat1[,1] %in% mat2[,1],
mat2[,2][match(mat1[,1], mat2[,1])], mat1[,1]))
mat3
# [,1]
#[1,] "a"
#[2,] "b"
#[3,] "c"
#[4,] "1"
#[5,] "2"
Here is another base R solution
v <- `names<-`(mat2[,2],mat2[,1])
mat3 <- matrix(unname(ifelse(is.na(v[mat1]),mat1,v[mat1])))
which gives
> mat3
[,1]
[1,] "a"
[2,] "b"
[3,] "c"
[4,] "1"
[5,] "2"
An option just using logical operation rather than a function
mat3 <- mat1
mat3[mat1[,1] %in% mat2[,1], 1] <- mat2[mat2[,1] %in% mat1[,1], 2]
Subsetting the values to find those that occur in both and replacing them where they do

Make r ignore the order at which values appear in a column (created from pasting multiple columns)

Given a variable x that can take values A,B,C,D
And three columns for variable x:
df1<-
rbind(c("A","B","C"),c("A","D","C"),c("B","A","C"),c("A","C","B"), c("B","C","A"), c("D","A","B"), c("A","B","D"), c("A","D","C"), c("A",NA,NA),c("D","A",NA),c("A","D",NA))
How do I make column indicating the combination of in the three preceding column such that permutations (ABC, ACB, BAC) would be considered as the same combination of ABC, (AD, DA) would be considered as the same combination of AD?
Pasting the three columns with apply(df1,1,function(x) paste(x[!is.na(x)], collapse=", ")->df1$x4 and using df1%>%group(x4)%>%summarize(c=count(x4)) would count AD,DA as different instead of the same.
Edited title
My desired result would be to get
a<-cbind(c("ABC",4),c("ACD",2),c("ABD",2),c("A",1),c("AD",2))
Someone already solved my question. Thanks
You can apply function paste after sorting each row vector.
df1 <-
cbind(df1, apply(df1, 1, function(x) paste(sort(x), collapse = "")))
df1
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "ABC"
# [2,] "A" "D" "C" "ACD"
# [3,] "B" "A" "C" "ABC"
# [4,] "A" "C" "B" "ABC"
# [5,] "B" "C" "A" "ABC"
# [6,] "D" "A" "B" "ABD"
# [7,] "A" "B" "D" "ABD"
# [8,] "A" "D" "C" "ACD"
# [9,] "A" NA NA "A"
#[10,] "D" "A" NA "AD"
#[11,] "A" "D" NA "AD"
You can now simply table the column, with no need for an external package to be loaded and more complex pipes.
table(df1[, 4])
#A ABC ABD ACD AD
#1 4 2 2 2

calculate the repeatence of combinations elements in R

suppose I have two vector like this :
l1 = c('C','D','E','F')
l2 = c('G','C','D','F')
I generate all combinations of two elements using combn function:
l1_vector = t(combn(l1,2))
l2_vector = t(combn(l2,2))
> l1_vector
[,1] [,2]
[1,] "C" "D"
[2,] "C" "E"
[3,] "C" "F"
[4,] "D" "E"
[5,] "D" "F"
[6,] "E" "F"
> l2_vector
[,1] [,2]
[1,] "G" "C"
[2,] "G" "D"
[3,] "G" "F"
[4,] "C" "D"
[5,] "C" "F"
[6,] "D" "F"
Now I want to calculate the repeat elements of l1_vector and l2_vector , as the example i give, the repeat of elements should be 3 (["C","D"],["C","F"],["D","F"])
How can I do that without using loop ?
As mentioned in the comments, you can use the merge function for this. Since the default behavior of merge is to use all of the available columns, it will return only those rows that are perfect matches.
> merge(l1_vector, l2_vector)
V1 V2
1 C D
2 C F
3 D F
>
> nrow(merge(l1_vector, l2_vector))
[1] 3
While merge is perfectly fine for your case, there is some work around.
If you just need the number of repeated elements:
choose(length(intersect(l1, l2)), 2)
[1] 3
If you need the repeated elements:
t(combn(intersect(l1, l2), 2))
[,1] [,2]
[1,] "C" "D"
[2,] "C" "F"
[3,] "D" "F"

Subset dataframe into equal subgroup chunks

I have df dataframe that needs subsetting into chunks of 2 names. From example below, there are 4 unique names: a,b,c,d. I need to subset into 2 one column matrices a,b and c,d.
Output format:
name1
item_value
item_value
...
END
name2
item_value
item_value
...
END
Example:
#dummy data
df <- data.frame(name=sort(c(rep(letters[1:4],2),"a","a","c")),
item=round(runif(11,1,10)),
stringsAsFactors=FALSE)
#tried approach - split per name. I need to split per 2 names.
lapply(split(df,f=df$name),
function(x)
{name <- unique(x$name)
as.matrix(c(name,x[,2],"END"))
})
#expected output
[,1]
[1,] "a"
[2,] "8"
[3,] "9"
[4,] "6"
[5,] "4"
[6,] "END"
[1,] "b"
[2,] "2"
[3,] "10"
[4,] "END"
[,2]
[1,] "c"
[2,] "6"
[3,] "6"
[4,] "2"
[5,] "END"
[1,] "d"
[2,] "4"
[3,] "1"
[4,] "END"
Note: Actual df has ~300000 rows with ~35000 unique names.
You may try this.
# for each 'name', "pad" 'item' with 'name' and 'END'
l1 <- lapply(split(df, f = df$name), function(x){
name <- unique(x$name)
as.matrix(c(name, x$item, "END"))
})
# create a sequence of numbers, to select two by two elements from the list
steps <- seq(from = 0, to = length(unique(df$name))/2, by = 2)
# loop over 'steps' to bind together list elements, two by two.
l2 <- lapply(steps, function(x){
do.call(rbind, l1[1:2 + x])
})
l2
# [[1]]
# [,1]
# [1,] "a"
# [2,] "6"
# [3,] "4"
# [4,] "10"
# [5,] "3"
# [6,] "END"
# [7,] "b"
# [8,] "6"
# [9,] "7"
# [10,] "END"
#
# [[2]]
# [,1]
# [1,] "c"
# [2,] "2"
# [3,] "6"
# [4,] "10"
# [5,] "END"
# [6,] "d"
# [7,] "5"
# [8,] "4"
# [9,] "END"
Instead of making the lists from individual names make it from the column of subsets of the data.frame
res <- list("a_b" = c(df[df$name == "a",2],"END",df[df$name == "b", 2],"END"),
"c_d" = c(df[df$name == "c",2],"END", df[df$name == "d", 2],"END"))
res2 <- vector(mode="list",length=2)
res2 <- sapply(1:(length(unique(df$name))/2),function(x) {
sapply(seq(1,length(unique(df$name))-1,by=2), function(y) {
name <- unique(df$name)
res2[x] <- as.matrix(c(name[y],df[df$name == name[y],2],"END",name[y+1],df[df$name == name[y+1],2],"END"))
})
})
answer <- res2[,1]
This is giving me a matrix of lists since there are two sapplys happening, I think everything you want is in res2[,1]

R: duplicates elimination in a matrix, keeping track of multiplicities

I have a basic problem with R.
I have produced the matrix
M
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "a" "3"
[4,] "c" "1"
I would like to obtain the 3X2 matrix
[,1] [,2] [,3]
[1,] "a" "1" "3"
[2,] "b" "2" NA
[3,] "c" "1" NA
obtained by eliminating duplicates in M[,1] and writing in N[i,2], N[i,3] the values in M[,2] corresponding to the same element in M[,1], for all i's. The "NA"'s in N[,3] correspond to the singletons in M[,1].
I know how to eliminate duplicates from a vector in R: my problem is to keep track of the elements in M[,2] and write them in the resulting matrix N. I tried with for cycles but they do not work so well in my "real world" case, where the matrices are much bigger.
Any suggestions?
I thank you very much.
You can use dcast in the reshape2 package after turning your matrix to a data.frame. To reverse the process you can use melt.
df = data.frame(c("a","b","a","c"),c(1:3,1))
colnames(df) = c("factor","obs")
require(reshape2)
df2=dcast(df, factor ~ obs)
now df2 is:
factor 1 2 3
1 a 1 NA 3
2 b NA 2 NA
3 c 1 NA NA
To me it makes more sense to keep it like this. But if you need it in your format:
res = t(apply(df2,1,function(x) { newLine = as.vector(x[which(!is.na(x))],mode="any"); newLine=c(newLine,rep(NA, ncol(df2)-length(newLine) )) }))
res = res[,-ncol(res)]
[,1] [,2] [,3]
[1,] "a" " 1" " 3"
[2,] "b" " 2" NA
[3,] "c" " 1" NA

Resources