Related
Suppose I have an m x n matrix M1 and an k x l matrix M2 with l <= n. I want to find all those rows in M1 that contain some row of M2 in it.
For example consider the following situation:
> M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M2 <- matrix(c(1,3,8,9), nrow = 2, ncol = 2, byrow = TRUE)
> M1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> M2
[,1] [,2]
[1,] 1 3
[2,] 8 9
Then rows one and three of M1 fulfill the condition, as row one contains 1 and 3 and the last row 8 and 9.
So how to achieve this in an efficient way? I have written code using loops, but as I am working with very large matrices this solution takes to much time.
This method will check each row from M2 and will return the row index from M1 if it is contained or NA in case it is not
M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
M2 <- matrix(c(1,3,8,5,4,5,1,2), nrow = 4, ncol = 2, byrow = TRUE)
> M2
[,1] [,2]
[1,] 1 3
[2,] 8 5
[3,] 4 5
[4,] 1 2
y = apply(M2,1,function(x){
z = unique(which(M1 %in% x)%%nrow(M1))
ifelse(length(z)==1,ifelse(z==0,nrow(M1),z),NA)
})
> y
[1] 1 NA 2 1
This means that row 2 from M2 is not in M1, and that rows 1 and 4 from M2 are in row 1 in M1. Also row 3 in M2 is in row 2 in M1.
More general example:
M1 <- matrix(c(1,2,3,1,2,3,4,5,6,7,8,9,10,11,12,13,1,2), nrow = 6, ncol = 3, byrow = TRUE)
M2 <- matrix(c(1,2,6,9, 10,11,16,17, 19, 2), nrow = 5, ncol =2, byrow = TRUE)
First, use match to find indexes of the matching values in M1.
ind <- match(M1, M2)
Now, using the mod operator %% with the indexes and the number of rows, you'll find the rows. This works because the indexes will always be the M2 row plus the total number of rows, so numbers in the same row will return the same result.
rows <- ind %% nrow(M2)
Then, m is a matrix containing row number of matching values between M1 and M2. Lines will only be selected if the same index appear in the same row 2 times (or, more generally, the number of times equal to the number of columns in M2). This assures that a row of M1 is only be considered if it contains all elements of a row in M2.
m <- matrix(rows, nrow = nrow(M1))
matchRows <- apply(m, 1, duplicated, incomparables = NA)
M1rows <- which(colSums(matchRows)==ncol(M2)-1)
I have a data frame like
df<-data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
a b c d
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
I want to use every row to create 3 (nrow(df)) 2*2 matrixes. 1st use 1,4,7,10, 2nd use 2,5,8,11, 3rd use 3,6,9,12. So that I can get 3 matrixes. Thank you.
We can use split to split up the dataset into list and use matrix
lapply(split.default(as.matrix(df), row(df)), matrix, 2)
If we need the matrix columns to be 1, 7 followed by 4, 10, use the byrow=TRUE
lapply(split.default(as.matrix(df), row(df)), matrix, 2, byrow=TRUE)
Or use apply with MARGIN = 1 and wrap it with list to get a list output
do.call("c", apply(df, 1, function(x) list(matrix(x, ncol=2))))
If we need a for loop, preassign a as a list with length equal to the number of rows of 'df'
a <- vector("list", nrow(df))
for(i in 1:nrow(df)){ a[[i]] <- matrix(unlist(df[i,]), ncol=2)}
a
Or if it can be stored as array
array(t(df), c(2, 2, 3))
Or using map:
m <- matrix(c(t(df)), ncol = 2, byrow = T)
p <- 2 # number of rows
Map(function(i,j) m[i:j,], seq(1,nrow(m),p), seq(p,nrow(m),p))
# [[1]]
# [,1] [,2]
# [1,] 1 4
# [2,] 7 10
# [[2]]
# [,1] [,2]
# [1,] 2 5
# [2,] 8 11
# [[3]]
# [,1] [,2]
# [1,] 3 6
# [2,] 9 12
I'd like to insert a dataframe into a dataframe element, such that if I called:df1[1,1] I would get:
[A B]
[C D]
I thought this was possible in R but perhaps I am mistaken. In a project of mine, I am essentially working with a 50x50 matrix, where I'd like each element to contain column of data containing numbers and labeled rows.
Trying to do something like df1[1,1] <- df2 yields the following warning
Warning message:
In [<-.data.frame(*tmp*, i, j, value = list(DJN.10 = c(0, 3, :
replacement element 1 has 144 rows to replace 1 rows
And calling df1[1,1] yields 0 . I've tried inserting the data in various ways, as with as.vector() and as.list() to no success.
Best,
Perhaps a matrix could work for you, like so:
x <- matrix(list(), nrow=2, ncol=3)
print(x)
# [,1] [,2] [,3]
#[1,] NULL NULL NULL
#[2,] NULL NULL NULL
x[[1,1]] <- data.frame(a=c("A","C"), b=c("B","D"))
x[[1,2]] <- data.frame(c=2:3)
x[[2,3]] <- data.frame(x=1, y=2:4)
x[[2,1]] <- list(1,2,3,5)
x[[1,3]] <- list("a","b","c","d")
x[[2,2]] <- list(1:5)
print(x)
# [,1] [,2] [,3]
#[1,] List,2 List,1 List,4
#[2,] List,4 List,1 List,2
x[[1,1]]
# a b
#1 A B
#2 C D
class(x)
#[1] "matrix"
typeof(x)
#[1] "list"
See here for details.
Each column in your data.frame can be a list. Just make sure that the list is as long as the number of rows in your data.frame.
Columns can be added using the standard $ notation.
Example:
x <- data.frame(matrix(NA, nrow=2, ncol=3))
x$X1 <- I(list(data.frame(a=c("A","C"), b=c("B","D")), matrix(1:10, ncol = 5)))
x$X2 <- I(list(data.frame(c = 2:3), list(1, 2, 3, 4)))
x$X3 <- I(list(list("a", "b", "c"), 1:5))
x
# X1 X2 X3
# 1 1:2, 1:2 2:3 a, b, c
# 2 1, 2, 3,.... 1, 2, 3, 4 1, 2, 3,....
x[1, 1]
# [[1]]
# a b
# 1 A B
# 2 C D
#
x[2, 1]
# [[1]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 5 7 9
# [2,] 2 4 6 8 10
Is it possible to have named rows and columns in Matrices?
for example:
[,a] [,b]
[a,] 1 , 2
[b,] 3 , 4
Is it even reasonable to have such a thing for exploring the data?
Sure. Use dimnames:
> a <- matrix(1:4, nrow = 2)
> a
[,1] [,2]
[1,] 1 3
[2,] 2 4
> dimnames(a) <- list(c("A", "B"), c("AA", "BB"))
> a
AA BB
A 1 3
B 2 4
With dimnames, you can provide a list of (first) rownames and (second) colnames for your matrix. Alternatively, you can specify rownames(x) <- whatever and colnames(x) <- whatever.
When dimnames is currently NULL, is it possible to re-name a matrix's dimestions one at a time?
For example, this fails:
mtx <- matrix(1:16,4)
dimnames(mtx)[[2]][1] <- 'col1'
with Error in dimnames(mtx)[[2]][1] <- "col1" : 'dimnames' must be a list
However this works:
mtx <- matrix(1:16,4)
dimnames(mtx)[[1]] <- letters[1:4]
dimnames(mtx)[[2]] <- LETTERS[1:4]
dimnames(mtx)[[2]][1] <- 'col1'
dimnames(mtx)[[2]][2] <- 'col2'
My objective is to seperately replace dimnames(mtx)[[2]][1] and dimnames(mtx)[[2]][2] etc ... if this is not possible, i can re-write the loop.
Thanks folks, I have ended up with the below -- I pass the names in via prepend:
mtxNameSticker <- function(mtx, prepend = NULL, MARGIN=2)
{
if (MARGIN == 1) max <- nrow(mtx) else
max <- ncol(mtx)
if (is.null(prepend)) if (MARGIN == 2) prepend <- 'C' else
prepend <- 'R'
if (length(prepend) == 1) prepend <- paste0(prepend, 1:dim(mtx)[[MARGIN]])
dimnames(mtx)[[MARGIN]] <- seq(from=1, by=1, length.out=dim(mtx)[[MARGIN]])
for (i in 1:max){
dimnames(mtx)[[MARGIN]][i] <- prepend[i]
}
return(mtx)
}
For as long as dimnames is NULL and not an appropriate list, you cannot make assignments to it at particular positions. One easy way to create a dummy but complete list of dimnames is to run:
dimnames(mtx) <- lapply(dim(mtx), seq_len)
mtx
# 1 2 3 4
# 1 1 5 9 13
# 2 2 6 10 14
# 3 3 7 11 15
# 4 4 8 12 16
Then, you can make assignments one at a time like you were wishing:
dimnames(mtx)[[2]][1] <- 'col1'
mtx
# col1 2 3 4
# 1 1 5 9 13
# 2 2 6 10 14
# 3 3 7 11 15
# 4 4 8 12 16
You are assigning a vector even though you are asked to supply a list.
Try this:
R> M <- matrix(1:4,2,2)
R> M
[,1] [,2]
[1,] 1 3
[2,] 2 4
R>
Columns:
R> M1 <- M; dimnames(M1) <- list(NULL, c("a","b")); M1
a b
[1,] 1 3
[2,] 2 4
R>
Rows:
R> M2 <- M; dimnames(M2) <- list(c("A","B"), NULL); M2
[,1] [,2]
A 1 3
B 2 4
R>
In response to your comment. #DirkEddelbuettel is correct, you are assigning a vector to what should be a list.
The reason for this is that you are assigning dimnames when the dimnames are NULL (not assigned yet)
The way R evaluates the following
x <- NULL
x[[2]][1] <- 'col1'
str(x)
## chr [1:2] NA "col1"
R returns a vector of length 2, not a list of length 2.
For your assignment to work, R would have to evaluate
x <- NULL
x[[2]][1] <- 'col1'
str(x)
to give
## List of 2
## $ : NULL
## $ : chr "col1"
Which is what would happen if x was originally defined as x <- list(NULL,NULL)
however, the dimnames must be NULL or a list of appropriate length vectors
The following does work (and is really #flodel solution)
dimnames(mtx) <- list(character(nrow(mtx)), character(ncol(mtx)))
# or
# dimnames(mtx) <- lapply(dim(mtx), character)
dimnames(mtx)[[2]][1] <- 'col1'
It seems you are allowed to set the name of the dimension without actually having any names for the dimension:
dimnames(mtx) = list(NULL,col1=NULL)
mtx
# col1
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16