R: duplicates elimination in a matrix, keeping track of multiplicities - r

I have a basic problem with R.
I have produced the matrix
M
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "a" "3"
[4,] "c" "1"
I would like to obtain the 3X2 matrix
[,1] [,2] [,3]
[1,] "a" "1" "3"
[2,] "b" "2" NA
[3,] "c" "1" NA
obtained by eliminating duplicates in M[,1] and writing in N[i,2], N[i,3] the values in M[,2] corresponding to the same element in M[,1], for all i's. The "NA"'s in N[,3] correspond to the singletons in M[,1].
I know how to eliminate duplicates from a vector in R: my problem is to keep track of the elements in M[,2] and write them in the resulting matrix N. I tried with for cycles but they do not work so well in my "real world" case, where the matrices are much bigger.
Any suggestions?
I thank you very much.

You can use dcast in the reshape2 package after turning your matrix to a data.frame. To reverse the process you can use melt.
df = data.frame(c("a","b","a","c"),c(1:3,1))
colnames(df) = c("factor","obs")
require(reshape2)
df2=dcast(df, factor ~ obs)
now df2 is:
factor 1 2 3
1 a 1 NA 3
2 b NA 2 NA
3 c 1 NA NA
To me it makes more sense to keep it like this. But if you need it in your format:
res = t(apply(df2,1,function(x) { newLine = as.vector(x[which(!is.na(x))],mode="any"); newLine=c(newLine,rep(NA, ncol(df2)-length(newLine) )) }))
res = res[,-ncol(res)]
[,1] [,2] [,3]
[1,] "a" " 1" " 3"
[2,] "b" " 2" NA
[3,] "c" " 1" NA

Related

Compare two matrices, keeping values in one matrix that are TRUE in the other

This seems to be an easy task, which I am not finding a solution on R after looking up here and elsewhere. I have two matrices, one with string values and another with logical values.
a <- matrix(c(
"A", "B", "C"
))
b <- matrix(c(
T, F, T
))
> b
[,1]
[1,] TRUE
[2,] FALSE
[3,] TRUE
> a
[,1]
[1,] "A"
[2,] "B"
[3,] "C"
I need to create a third matrix that keeps values in the first that are TRUE in the second, and leaving NA on the remainder, like so:
> C
[,1]
[1,] "A"
[2,] NA
[3,] "C"
How do I achieve the above result?
C <- matrix(a[ifelse(b, T, NA)], ncol = ncol(a))
Here is an alternative by just assigning the NA to FALSE:
a[b==FALSE] <- NA
[,1]
[1,] "A"
[2,] NA
[3,] "C"
using which:
c<-a
c[which(b==FALSE)]<-NA
a <- a[b] . This might also work, Depending on how you want the result.

Comparing rows of matrix and replacing matching elements

I want to compare two matrices. If row elements in the first matrix matches row elements in the second matrix, then I want the rows in the second matrix to be kept. If the rows do not match, then I want those rows to be to empty. I apologise that I had a quite similar question recently, but I still haven't been able to solve this one.
INPUT:
> mat1<-cbind(letters[3:8])
> mat1
[,1]
[1,] "c"
[2,] "d"
[3,] "e"
[4,] "f"
[5,] "g"
[6,] "h"
> mat2<-cbind(letters[1:5],1:5)
> mat2
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
Expected OUTPUT:
> mat3
[,1] [,2]
[1,] "NA" "NA"
[2,] "NA" "NA"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
I have unsuccessfully attempted this:
> mat3<-mat2[ifelse(mat2[,1] %in% mat1[,1],mat2,""),]
Error in mat2[ifelse(mat2[, 1] %in% mat1[, 1], mat2, ""), ] :
no 'dimnames' attribute for array
I have been struggling for hours, so any suggestions are welcomed.
You were on the right track, but the answer is a little simpler than what you were trying. mat2[, 1] %in% mat1[, 1] returns the matches as a logical vector, and we can just set the non-matches to NA using that vector as an index.
mat1<-cbind(letters[3:8])
mat2<-cbind(letters[1:5],1:5)
match <- mat2[,1] %in% mat1 # gives a T/F vector of matches
mat3 <- mat2
mat3[!match,] <- NA

Table of all intersections in two data frames

I have two data frames. Each row of the dataframes has a different number of elements (actually gene names) -- I used read.csv("file.csv",fill=TRUE) to read them in, so there some na padding in some of the rows.
Each of the data frames have the same elements, only they've been clustered differently, so they are in different groups. I want to output a table of the intersections from the two dataframes.
So if
df1<-data.frame(c("a","b","NA","NA"),c("c","d","e","f"),c("g","h","i","NA" ),c("j","NA","NA","NA"))
df2<-data.frame(c("c","e","i","NA"),c("f","g","h","NA"),c("a","b","d","j" ))
then I want to get to something like this:
df1[1,] df1[2,] df1[3,] df1[4,]
df2[1,] 0 2 1 0
df2[2,] 0 1 2 0
df2[3,] 2 1 0 1
It seems like it should be something I should be able to do with intersect() and an apply function of some sort. I can't get my head around it though. Using my google-fu the nearest I can find is this :Finding an efficient way to count the number of overlaps between interval sets in two tables?, but that deals with data tables and is looking at numerical overlaps in line segments as best I can tell, not lists of names.
Does anyone have any idea how to do this?
You could do this by looping through the rows of each data frame and then calculating the length of the intersection of the rows, omitting missing values:
apply(df1, 1, function(i) apply(df2, 1, function(j) length(na.omit(intersect(i, j)))))
# [,1] [,2] [,3] [,4]
# [1,] 0 2 1 0
# [2,] 0 1 2 0
# [3,] 2 1 0 1
Sample data:
(df1<-rbind(c("a","b", NA, NA),c("c","d","e","f"),c("g","h","i", NA),c("j", NA, NA, NA)))
# [,1] [,2] [,3] [,4]
# [1,] "a" "b" NA NA
# [2,] "c" "d" "e" "f"
# [3,] "g" "h" "i" NA
# [4,] "j" NA NA NA
(df2<-rbind(c("c","e","i", NA),c("f","g","h", NA),c("a","b","d","j")))
# [,1] [,2] [,3] [,4]
# [1,] "c" "e" "i" NA
# [2,] "f" "g" "h" NA
# [3,] "a" "b" "d" "j"

From list of characters to matrix/data-frame of numeric (R)

I have a long list, whose elements are lists of length one containing a character vector. These vectors can have different lengths.
The element of the vectors are 'characters' but I would like to convert them in numeric, as they actually represent numbers.
I would like to create a matrix, or a data frame, whose rows are the vectors above, converted into numeric. Since they have different lengths, the "right ends" of each row could be filled with NA.
I am trying to use the function rbind.fill.matrix from the library {plyr}, but the only thing I could get is a long numeric 1-d array with all the numbers inside, instead of a matrix.
This is the best I could do to get a list of numeric (dat here is my original list):
dat<-sapply(sapply(dat,unlist),as.numeric)
How can I create the matrix now?
Thank you!
I would do something like:
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
The basic idea is that stri_list2matrix will convert the list to a matrix, but it would still be a character matrix. as.numeric would remove the dimensional attributes of the matrix, so we add those back in with:
`dim<-` ## Yes, the backticks are required -- or at least quotes
POC:
dat <- list(1:2, 1:3, 1:2, 1:5, 1:6)
dat <- lapply(dat, as.character)
dat
# [[1]]
# [1] "1" "2"
#
# [[2]]
# [1] "1" "2" "3"
#
# [[3]]
# [1] "1" "2"
#
# [[4]]
# [1] "1" "2" "3" "4" "5"
#
# [[5]]
# [1] "1" "2" "3" "4" "5" "6"
library(stringi)
temp <- stri_list2matrix(dat, byrow = TRUE)
final <- `dim<-`(as.numeric(temp), dim(temp))
final
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 NA NA NA NA
# [2,] 1 2 3 NA NA NA
# [3,] 1 2 NA NA NA NA
# [4,] 1 2 3 4 5 NA
# [5,] 1 2 3 4 5 6

Convert List to Vectors in R (1st vector comprises of 1st element of each element of list, etc.) in R

I am wondering how to convert lists to vectors in R where each vector comprises of an element of an element of the list. In particular, I would like the 1st vector to comprise of the 1st element of each element of the list. I would like the 2nd vector to comprise of the 2nd element of each element of the list. More generally, I would like the nth vector to comprise of the nth element of each element of the list. Thus, n will equal the length of the longest element of the list.
For example, suppose we had:
mylist = list(c("a", "b"), c(character(0)), c(1, 2, 3))
I would like to create three vectors where
first_vector = c("a", NA, 1)
second_vector = c("b", NA, 2)
third_vector = c(NA, NA, 3)
As you can see in the above example, I may have additional complications due to missing values.
Thank you so much in advance for help!
-Vincent
Usually creating endless amount of objects in your global environment is bad practice, and since all your vectors are of the same length, you could just create one matrix instead
indx <- max(lengths(mylist))
sapply(mylist, `length<-`, indx)
# [,1] [,2] [,3]
# [1,] "a" NA "1"
# [2,] "b" NA "2"
# [3,] NA NA "3"
You could also consider stri_list2matrix from the "stringi" package. Depending on the orientation desired:
library(stringi)
stri_list2matrix(mylist)
# [,1] [,2] [,3]
# [1,] "a" NA "1"
# [2,] "b" NA "2"
# [3,] NA NA "3"
stri_list2matrix(mylist, byrow = TRUE)
# [,1] [,2] [,3]
# [1,] "a" "b" NA
# [2,] NA NA NA
# [3,] "1" "2" "3"

Resources