melt the lower half matrix in R - r

How can I melt a lower half triangle plus diagonal matrix ?
11 NA NA NA NA
12 22 NA NA NA
13 23 33 NA NA
14 24 34 44 NA
15 25 35 45 55
A <- t(matrix (c(11, NA, NA, NA, NA, 12, 22, NA, NA, NA,
13, 23, 33, NA, NA, 14, 24, 34, 44, NA,15, 25,
35, 45, 55), ncol = 5))
> A
[,1] [,2] [,3] [,4] [,5]
[1,] 11 NA NA NA NA
[2,] 12 22 NA NA NA
[3,] 13 23 33 NA NA
[4,] 14 24 34 44 NA
[5,] 15 25 35 45 55
To data.frame in row and col (preserving the following order)
col row value
1 1 11
1 2 12
1 3 13
1 4 14
1 5 15
2 2 22
2 3 23
2 4 24
2 5 25
3 3 33
3 4 34
3 5 35
4 4 44
4 5 45
5 5 55

If you want the indices as columns as well, this should work:
m <- matrix(1:25,5,5)
m[upper.tri(m)] <- NA
m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 NA NA NA NA
[2,] 2 7 NA NA NA
[3,] 3 8 13 NA NA
[4,] 4 9 14 19 NA
[5,] 5 10 15 20 25
cbind(which(!is.na(m),arr.ind = TRUE),na.omit(as.vector(m)))
row col
[1,] 1 1 1
[2,] 2 1 2
[3,] 3 1 3
[4,] 4 1 4
[5,] 5 1 5
[6,] 2 2 7
[7,] 3 2 8
[8,] 4 2 9
[9,] 5 2 10
[10,] 3 3 13
[11,] 4 3 14
[12,] 5 3 15
[13,] 4 4 19
[14,] 5 4 20
[15,] 5 5 25
I guess I'll explain this a bit. I'm using three "tricks":
The arr.ind argument to which to get the indices
The very useful na.omit function to avoid some extra typing
The fact that R stores matrices in column major form, hence as.vector returns the values in the right order.

My one liner.
reshape2::melt(A, varnames = c('row', 'col'), na.rm = TRUE)

Here's my first solution:
test <- rbind(c(11,NA,NA,NA,NA),
c(12,22,NA,NA,NA),
c(13,23,33,NA,NA),
c(14,24,34,44,NA),
c(15,25,35,45,55)) ## Load the matrix
test2 <- as.vector(test) ## "melt" it into a vector
test <- cbind( test2[!is.na(test2)] ) ## get rid of NAs, cbind it into a column
Results are:
> test
[,1]
[1,] 11
[2,] 12
[3,] 13
[4,] 14
[5,] 15
[6,] 22
[7,] 23
[8,] 24
[9,] 25
[10,] 33
[11,] 34
[12,] 35
[13,] 44
[14,] 45
[15,] 55
Alternatively, you can use the matrix command:
test <- rbind(c(11,NA,NA,NA,NA),
c(12,22,NA,NA,NA),
c(13,23,33,NA,NA),
c(14,24,34,44,NA),
c(15,25,35,45,55)) ## Load the matrix
test2 <- matrix(test, ncol=1)
test <- cbind( test2[!is.na(test2), ] )
## same as above, except now explicitly noting rows to replace.

Here is my attempt:
# enter the data
df <- c(11,12,13,14,15,NA,22,23,24,25,NA,NA,33,34,35,NA,NA,NA,44,45,NA,NA,NA,NA,55)
dim(df) <- c(5,5)
df
# make new data frame with rows and column indicators
melteddf <- data.frame(
value=df[lower.tri(df,diag=T)],
col=rep(1:ncol(df),ncol(df):1),
row=unlist(sapply(1:nrow(df),function(x) x:nrow(df)))
)
I wish I knew about the arr.ind part of cbind which before now though.

Here is a method using arrayInd which is basically the same as #joran's but might be useful in other settings:
na.omit( data.frame(arrayInd(1:prod(dim(A)), dim(A)), value=c(A)) )
X1 X2 value
1 1 1 11
2 2 1 12
3 3 1 13
4 4 1 14
5 5 1 15
7 2 2 22
8 3 2 23
9 4 2 24
10 5 2 25
13 3 3 33
14 4 3 34
15 5 3 35
19 4 4 44
20 5 4 45
25 5 5 55

Related

How to produce a a randomized number sequence within seven block without boundary repeat using R?

There are 9 treatments and we want to have 7 blocks. In each block, the treatment should be repeated once.
The 9 treatments are marked as follows:
-Treatment 1 (1-7)
-Treatment 2 (8-14)
-Treatment 3 (15-21)
-Treatment 4 (22-28)
-Treatment 5 (29-35)
-Treatment 6 (36-42)
-Treatment 7 (43-49)
-Treatment 8 (50-56)
-Treatment 9 (57-63)
Each number represents a pot. We want these pots randomised in 7 blocks (columns) but we don't want two pot of the same treatment adjacent to each other - highlighted in grey:
How would I go about this in R?
If I'm interpreting it correctly, this should work.
We'll do a two-step sampling:
First, sample the treatment group itself, making it much easier to determine if a particular row in the block is in the same treatment group as the same row, previous block.
Second, sample one from each of the proven-safe groups.
I'll use a random seed here for reproducibility, do not use set.seed(.) in production.
set.seed(42)
nBlocks <- 7
treatments <- list(1:7, 8:14, 15:21, 22:28, 29:35, 36:42, 43:49, 50:56, 57:63)
blocks <- Reduce(function(prev, ign) {
while (TRUE) {
this <- sample(length(treatments))
if (!any(this == prev)) break
}
this
}, seq.int(nBlocks)[-1], init = sample(length(treatments)), accumulate = TRUE)
blocks <- do.call(cbind, blocks)
blocks
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 3 4 2 8 2 1
# [2,] 5 1 2 4 5 7 9
# [3,] 9 8 9 3 1 3 5
# [4,] 7 9 3 6 7 9 3
# [5,] 2 4 8 5 4 1 4
# [6,] 4 7 1 9 6 4 2
# [7,] 8 6 5 7 2 6 8
# [8,] 3 5 6 8 9 5 6
# [9,] 6 2 7 1 3 8 7
Here each column is a "block", and each number represents the treatment group assigned to each row. You can see that no rows contain the same group in subsequent columns.
For instance, the first column ("block 1") will have something from the Treatment 1 group in the first row, Treatment 5 group in row two, etc. Further, inspection will show that all treatments are included in each block column, an inferred requirement of the experimental design.
(FYI, it is theoretically possible that this will take a while based on the random conditions. Because it repeats per-column, it should be relatively efficient, though. I have no safeguards here for too-long-execution, but I don't think it is required: the conditions here do not lend to a high likelihood of "failure" requiring much repetition.)
The next step is to convert each of these group numbers into a number from the respective treatment group.
apply(blocks, 1:2, function(ind) sample(treatments[[ind]], 1))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 6 17 22 11 54 14 3
# [2,] 30 3 13 22 33 48 58
# [3,] 63 55 61 15 4 21 33
# [4,] 49 60 21 36 43 58 21
# [5,] 12 25 55 32 27 7 25
# [6,] 24 46 4 58 38 28 11
# [7,] 53 38 35 49 11 36 56
# [8,] 16 29 36 56 63 29 40
# [9,] 36 8 47 3 19 50 43
To verify, in the first matrix, our first three rows (block 1) were 1, 5, and 9, which should translate into 1-7, 29-35, and57-63, respectively. "6" is within 1-7, "30" is within 29-35, and "63" is within 59-63. Inspection will show the remainder to be correct.
Because of the step of determining treatment groups first, it is much simpler to verify/guarantee that you will not repeat treatment groups in a row between two adjacent blocks.
EDIT
Rules:
The same treatment group may not be on the same row in adjacent columns; and
The same treatment (not group) may not be in any row in adjacent columns.
We can use the same methodology as before. Note that as any groups become smaller, the iteration time may increase but I do not expect it likely to get into an infinite loop. (However, if you inadvertently have a group of length 1, then ... this will never end.)
nBlocks <- 7
treatments <- list(1:7, 8:14, 15:21, 22:28, 29:35, 36:42, 43:49, 50:56, 57:63)
# helper function for randomized selection of treatments given groups
func <- function(grp) cbind(grp, sapply(treatments[grp], sample, size = 1))
set.seed(42)
func(c(1,3,5))
# grp
# [1,] 1 1
# [2,] 3 19
# [3,] 5 29
And then the same Reduce mindset:
set.seed(42)
blocks <- Reduce(function(prev, ign) {
while (TRUE) {
this1 <- sample(length(treatments))
if (!any(this1 == prev[,1])) break
}
while (TRUE) {
this2 <- func(this1)
if (!any(this2[,2] %in% prev[,2])) break
}
this2
}, seq.int(nBlocks-1), init = func(sample(length(treatments))), accumulate = TRUE)
blocks <- do.call(cbind, blocks)
groups <- blocks[, seq(1, by = 2, length.out = nBlocks)]
treats <- blocks[, seq(2, by = 2, length.out = nBlocks)]
From this, we have two products (though you will likely only care about the second):
The treatment groups, good to verify rule 1 above: no group may be in the same row in adjacent columns:
groups
# grp grp grp grp grp grp grp
# [1,] 1 3 1 7 8 5 1
# [2,] 5 1 2 8 2 7 3
# [3,] 9 8 5 2 1 4 6
# [4,] 7 9 6 3 4 8 5
# [5,] 2 4 7 9 3 9 4
# [6,] 4 7 4 5 7 1 2
# [7,] 8 6 9 1 9 6 7
# [8,] 3 5 8 6 5 2 9
# [9,] 6 2 3 4 6 3 8
The treatments themselves, for rule 2 above, where no treatment may be in adjacent columns:
treats
#
# [1,] 7 19 2 47 51 33 3
# [2,] 35 4 12 50 8 44 15
# [3,] 60 51 35 10 1 22 41
# [4,] 43 58 41 21 26 55 31
# [5,] 12 24 43 57 17 57 26
# [6,] 27 49 26 34 48 6 11
# [7,] 53 36 62 6 62 36 47
# [8,] 16 33 54 42 32 10 62
# [9,] 37 9 15 27 37 18 56
Edit 2:
Another rule:
Each treatment group must be seen exactly once in each row and column (requiring a square experimental design).
I think this is effectively generating a sudoku-like matrix of treatment groups, and once that is satisfied, backfill rule #2 (no repeat treatments in adjacent columns). One way (though it is hasty) is suggested by https://gamedev.stackexchange.com/a/138228:
set.seed(42)
vec <- sample(9)
ind <- sapply(cumsum(c(0, 3, 3, 1, 3, 3, 1, 3, 3)), rot, x = vec)
apply(ind, 1, function(z) all(1:9 %in% z)) # all rows have all 1-9, no repeats
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
apply(ind, 1, function(z) all(1:9 %in% z)) # ... columns ...
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
ind
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 7 8 3 5 2 4 6 9
# [2,] 5 2 3 6 9 4 8 1 7
# [3,] 9 4 6 1 7 8 3 5 2
# [4,] 7 8 1 5 2 3 6 9 4
# [5,] 2 3 5 9 4 6 1 7 8
# [6,] 4 6 9 7 8 1 5 2 3
# [7,] 8 1 7 2 3 5 9 4 6
# [8,] 3 5 2 4 6 9 7 8 1
# [9,] 6 9 4 8 1 7 2 3 5
This makes a rather fixed-style of random group arrangements given the constraints on groups. Since this is a design of experiments, if you're going to use this method (and proximity between blocks is at all a concern), then you should likely randomize columns and/or rows of the ind matrix before sampling the treatments themselves. (You can do columns and rows, just do them piece-wise, and it should preserve the constraints.)
ind <- ind[sample(9),][,sample(9)]
ind
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 2 3 8 1 4 7 9 6 5
# [2,] 7 8 4 6 2 9 5 3 1
# [3,] 1 7 9 4 5 6 3 2 8
# [4,] 8 1 6 9 3 4 2 5 7
# [5,] 5 2 7 8 9 1 6 4 3
# [6,] 3 5 1 7 6 8 4 9 2
# [7,] 4 6 3 5 8 2 7 1 9
# [8,] 6 9 5 2 1 3 8 7 4
# [9,] 9 4 2 3 7 5 1 8 6
From here, we can enact rule 2:
treatments <- list(1:7, 8:14, 15:21, 22:28, 29:35, 36:42, 43:49, 50:56, 57:63)
mtx <- do.call(rbind, Reduce(function(prev, ind) {
while (TRUE) {
this <- sapply(treatments[ind], sample, size = 1)
if (!any(prev %in% this)) break
}
this
}, asplit(ind, 2)[-1],
init = sapply(treatments[ind[,1]], sample, size = 1),
accumulate = TRUE))
mtx
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 11 44 4 52 30 15 23 41 59
# [2,] 16 56 49 3 12 33 39 57 27
# [3,] 52 24 60 40 46 2 20 29 13
# [4,] 1 37 23 63 56 48 32 12 17
# [5,] 24 10 30 16 58 39 50 2 47
# [6,] 49 57 41 25 6 52 11 17 34
# [7,] 59 31 19 14 38 23 47 51 7
# [8,] 41 17 11 33 24 61 5 43 54
# [9,] 29 4 51 45 20 8 58 28 40

R: initialize /create data frame in for loop

I wonder what would be the best or most appropriate way to create and modify a data frame in a for-loop, using cbind or rbind? For the first iteration, the data frame has no column or rows, so - in the below example - cbind does not work. Only for this first case, I need the if-else-command inside the for-loop. Isn't there a more elegant way wrting the code below, i.e. without if-else?
mydat <- data.frame()
for (j in 1:10) {
if (ncol(mydat) == 0)
mydat <- data.frame(sample(x = j * 5, size = 20, replace = T))
else
mydat <- cbind(mydat, data.frame(sample(x = j * 5, size = 20, replace = T)))
}
colnames(mydat) <- sprintf("x%i", 1:10)
Here is a simple way to combine lapply and the do.call(cbind, list) convention for generating the data.frame you want.
set.seed(1234)
gendata <- function(x) {
sample(x = x*5, size = 20, replace = T)
}
do.call(cbind, lapply(1:10, gendata))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 4 9 18 24 2 27 21 20 24
# [2,] 4 4 10 1 12 17 33 40 26 18
# [3,] 4 2 5 7 4 9 35 13 20 31
# [4,] 4 1 10 1 14 7 33 20 11 4
# [5,] 5 3 5 5 5 5 18 15 4 48
# [6,] 4 9 8 15 23 10 10 26 29 2
# [7,] 1 6 11 7 10 5 9 30 20 43
# [8,] 2 10 8 11 8 4 18 23 4 32
# [9,] 4 9 4 2 5 14 18 40 37 16
# [10,] 3 1 12 12 23 2 12 24 15 38
# [11,] 4 5 2 3 5 22 34 18 35 32
# [12,] 3 3 5 18 23 4 23 10 27 50
# [13,] 2 4 11 1 4 29 5 4 32 7
# [14,] 5 6 8 16 4 4 15 35 20 45
# [15,] 2 2 3 2 3 7 33 10 16 41
# [16,] 5 8 8 11 13 28 17 40 35 42
# [17,] 2 3 8 8 8 29 32 25 20 42
# [18,] 2 3 12 2 1 9 21 40 26 37
# [19,] 1 10 3 7 8 4 23 16 6 50
# [20,] 2 9 13 14 19 24 31 23 14 32
EDIT:
As was pointed out by Konrad Rudolph, the result I provided was a matrix not a data.frame. Just convert the matrix using as.data.frame:
set.seed(1234)
gendata <- function(x) {
sample(x = x*5, size = 20, replace = T)
}
dat <- as.data.frame(do.call(cbind, lapply(1:10, gendata)))
names(dat) <- sprintf("x%i", 1:10)
head(dat)
# x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
# 1 1 4 9 18 24 2 27 21 20 24
# 2 4 4 10 1 12 17 33 40 26 18
# 3 4 2 5 7 4 9 35 13 20 31
# 4 4 1 10 1 14 7 33 20 11 4
# 5 5 3 5 5 5 5 18 15 4 48
# 6 4 9 8 15 23 10 10 26 29 2

reordering rows of matrix by data sequence inside [duplicate]

Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.

Get contagion chain from adjacency matrix, r, igraph

Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.

Matching one column into the other in numerical order in R

I have a datafile:
https://dl.dropbox.com/u/22681355/example.csv
Read file:
example<-read.csv("example.csv")
example<-example[,-1]
example[,1] contains a list of numbers increasing in numerical order.
example[,2] contains another set of numbers
First I would like to identify the numbers in example[,2] that are no listed in example[,1]
diff<-setdiff(example[,2],example[,1])
Now that I know these values I would like to insert them into example[,1] leaving existing values in example[,1] and example[,2] intact.
A short example would be:
Example[,1] Example[,2]
1 1000
1 50
1 3
1 90
1 25
3 4
5 2
5 7
etc etc
After I run setdiff() I get the numbers not in the first column but in the second.
Now I would like to place them in example[,1] to produce the following output:
Example[,1] Example[,2]
1 1000
1 50
1 3
1 90
1 25
2 NA
3 4
4 NA
5 2
5 7
etc etc
So basically placing them in numerical order but leaving everything else intact.
Part 1 excellently solved by Joris Meys!
I have two further questions:
/////////////////////////////////////////////
////////////////////////////////////////////
1:
Can the same be done if there is an additional third column but I don't want to do anything with it?
e.g.:
ORIGINAl
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
3 4 15
5 2 20
5 7 9
etc etc
Desired OUTPUT:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 NA NA
3 4 15
4 NA NA
5 2 20
5 7 19
etc etc
2:
Instead of adding NA in example[,2] to cases where example[,1] doesnt have the value from example[,2] for example example[,1] doesn't have number '30' then I would like to search for whether example[,2] has number'30'and see what value example[,1] has in that row then add it to example[,2] instead of the NA's.
for example:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 NA NA
3 4 15
4 NA NA
5 2 20
5 7 19
etc etc
Instead of NA's have:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 5 20
3 4 15
4 3 15
5 2 20
5 7 19
etc etc
So, after you made clear what you want, this means you have a matrix
Example <-
matrix(
c(1,1,1,1,1,3,5,5,1000,50,3,90,25,4,2,7),
ncol=2
)
Then you could do the following :
diffs <- setdiff(Example[,2],Example[,1])
tmps <- rbind(Example,
matrix(
c(diffs,rep(NA,length(diffs))),
ncol=2
)
)
solution <- tmps[order(tmps[,1]),]
Which will give you the following result:
> solution
[,1] [,2]
[1,] 1 1000
[2,] 1 50
[3,] 1 3
[4,] 1 90
[5,] 1 25
[6,] 2 NA
[7,] 3 4
[8,] 4 NA
[9,] 5 2
[10,] 5 7
[11,] 7 NA
...
See the help files ?matrix and ?order.
The following approch also works if your matrix has more than two columns. It's an extension of Joris Meys' solution.
Example <- matrix(c(1,1,1,1,1,3,5,5,
1000,50,3,90,25,4,2,7,37,18,54,72,23,15,20,9),ncol=3)
diffs <- setdiff(Example[,2], Example[,1])
new_mat <- rbind(Example,
matrix(c(diffs,
rep(NA, length(diffs) * (ncol(Example) - 1))),
ncol = ncol(Example)))
solution <- new_mat[order(new_mat[,1]),]
The result:
[,1] [,2] [,3]
[1,] 1 1000 37
[2,] 1 50 18
[3,] 1 3 54
[4,] 1 90 72
[5,] 1 25 23
[6,] 2 NA NA
[7,] 3 4 15
[8,] 4 NA NA
[9,] 5 2 20
[10,] 5 7 9
[11,] 7 NA NA
[12,] 25 NA NA
[13,] 50 NA NA
[14,] 90 NA NA
[15,] 1000 NA NA
Once you have created this matrix, it's easy to generate a new one without NAs:
solution2 <- solution
solution2[is.na(solution2)] <- Example[match(sort(diffs), Example[,2]), -2]
The result:
[,1] [,2] [,3]
[1,] 1 1000 37
[2,] 1 50 18
[3,] 1 3 54
[4,] 1 90 72
[5,] 1 25 23
[6,] 2 5 20
[7,] 3 4 15
[8,] 4 3 15
[9,] 5 2 20
[10,] 5 7 9
[11,] 7 5 9
[12,] 25 1 23
[13,] 50 1 18
[14,] 90 1 72
[15,] 1000 1 37

Resources