Matching one column into the other in numerical order in R - r

I have a datafile:
https://dl.dropbox.com/u/22681355/example.csv
Read file:
example<-read.csv("example.csv")
example<-example[,-1]
example[,1] contains a list of numbers increasing in numerical order.
example[,2] contains another set of numbers
First I would like to identify the numbers in example[,2] that are no listed in example[,1]
diff<-setdiff(example[,2],example[,1])
Now that I know these values I would like to insert them into example[,1] leaving existing values in example[,1] and example[,2] intact.
A short example would be:
Example[,1] Example[,2]
1 1000
1 50
1 3
1 90
1 25
3 4
5 2
5 7
etc etc
After I run setdiff() I get the numbers not in the first column but in the second.
Now I would like to place them in example[,1] to produce the following output:
Example[,1] Example[,2]
1 1000
1 50
1 3
1 90
1 25
2 NA
3 4
4 NA
5 2
5 7
etc etc
So basically placing them in numerical order but leaving everything else intact.
Part 1 excellently solved by Joris Meys!
I have two further questions:
/////////////////////////////////////////////
////////////////////////////////////////////
1:
Can the same be done if there is an additional third column but I don't want to do anything with it?
e.g.:
ORIGINAl
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
3 4 15
5 2 20
5 7 9
etc etc
Desired OUTPUT:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 NA NA
3 4 15
4 NA NA
5 2 20
5 7 19
etc etc
2:
Instead of adding NA in example[,2] to cases where example[,1] doesnt have the value from example[,2] for example example[,1] doesn't have number '30' then I would like to search for whether example[,2] has number'30'and see what value example[,1] has in that row then add it to example[,2] instead of the NA's.
for example:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 NA NA
3 4 15
4 NA NA
5 2 20
5 7 19
etc etc
Instead of NA's have:
Example[,1] Example[,2] Example[,3]
1 1000 37
1 50 18
1 3 54
1 90 72
1 25 23
2 5 20
3 4 15
4 3 15
5 2 20
5 7 19
etc etc

So, after you made clear what you want, this means you have a matrix
Example <-
matrix(
c(1,1,1,1,1,3,5,5,1000,50,3,90,25,4,2,7),
ncol=2
)
Then you could do the following :
diffs <- setdiff(Example[,2],Example[,1])
tmps <- rbind(Example,
matrix(
c(diffs,rep(NA,length(diffs))),
ncol=2
)
)
solution <- tmps[order(tmps[,1]),]
Which will give you the following result:
> solution
[,1] [,2]
[1,] 1 1000
[2,] 1 50
[3,] 1 3
[4,] 1 90
[5,] 1 25
[6,] 2 NA
[7,] 3 4
[8,] 4 NA
[9,] 5 2
[10,] 5 7
[11,] 7 NA
...
See the help files ?matrix and ?order.

The following approch also works if your matrix has more than two columns. It's an extension of Joris Meys' solution.
Example <- matrix(c(1,1,1,1,1,3,5,5,
1000,50,3,90,25,4,2,7,37,18,54,72,23,15,20,9),ncol=3)
diffs <- setdiff(Example[,2], Example[,1])
new_mat <- rbind(Example,
matrix(c(diffs,
rep(NA, length(diffs) * (ncol(Example) - 1))),
ncol = ncol(Example)))
solution <- new_mat[order(new_mat[,1]),]
The result:
[,1] [,2] [,3]
[1,] 1 1000 37
[2,] 1 50 18
[3,] 1 3 54
[4,] 1 90 72
[5,] 1 25 23
[6,] 2 NA NA
[7,] 3 4 15
[8,] 4 NA NA
[9,] 5 2 20
[10,] 5 7 9
[11,] 7 NA NA
[12,] 25 NA NA
[13,] 50 NA NA
[14,] 90 NA NA
[15,] 1000 NA NA
Once you have created this matrix, it's easy to generate a new one without NAs:
solution2 <- solution
solution2[is.na(solution2)] <- Example[match(sort(diffs), Example[,2]), -2]
The result:
[,1] [,2] [,3]
[1,] 1 1000 37
[2,] 1 50 18
[3,] 1 3 54
[4,] 1 90 72
[5,] 1 25 23
[6,] 2 5 20
[7,] 3 4 15
[8,] 4 3 15
[9,] 5 2 20
[10,] 5 7 9
[11,] 7 5 9
[12,] 25 1 23
[13,] 50 1 18
[14,] 90 1 72
[15,] 1000 1 37

Related

pheatmap: color each row of a matrix independently using a gradient from max to min

I am trying to show the colors for each row in my matrix (min to max gradient):
[1,] 8 12 11 5 24
[2,] 1 2 7 0 8
[3,] 53 99 1501 15 9859
[4,] 59 24 19 19 32
[5,] 4 2 11 0 68
[6,] 4 9 177 2 710
[7,] 2 1 2 2 3
[8,] 0 5 133 0 2195
[9,] 3 3 2 1 15
[10,] 0 1 0 0 14
[11,] 0 3 21 0 17
[12,] 2 1 2 0 6
[13,] 11 26 22 3 16
[14,] 6 38 217 1 354
[15,] 3 10 17 0 68
[16,] 3 3 12 2 19
[17,] 7 5 26 1 40
[18,] 1 0 6 0 27
[19,] 1 0 37 0 434
[20,] 30 15 20 9 27
My code is as follows:
RT<-read.table("tmp",header=FALSE,sep="\t");
mat <-data.frame(RT)
pheatmap(mat, scale = "row",cluster_cols=FALSE,cluster_rows = FALSE, row.names =FALSE);
dev.off()
I have received the following image:
As you can see in the 2nd row the max needs to be red but it is orange because column 3 and 5 of the 2nd row have similar values. I want to do max is red, min is blue, and gradient between red,yellow and blue, for the values between max and min for each row independently. 0's always be blue. Only identical values should get the same color. I have tried other solutions from R: Row scaling not working correctly for heatmap but that didn't work for me. Please help.
I tried a few things and I think I may have an answer that is close:
A) divide each row with its max.
xx <- apply(mat,1, function(i) i/max(i));
B) minimum is always 0. So, I converted the minimum of each row to 0.
xx1<-t(apply(xx, 1, function(x) replace(x, x== min(x), 0.0)))
pheatmap(xx1, scale = "none",cluster_cols=FALSE,cluster_rows = FALSE, row.names =FALSE);
dev.off()
This seems to be working but it can be done better.

Creating contingency table from matrix

I have a matrix like so:
country cLabel
[1,] 3 1
[2,] 6 2
[3,] 8 1
[4,] 5 2
[5,] 5 2
[6,] 8 2
[7,] 8 2
[8,] 8 2
[9,] 8 2
[10,] 4 2
[11,] 6 2
[12,] 3 2
[13,] 5 2
[14,] 5 1
country is a value of 1-8, and cLabel is a value of 1-2. How can I print the contingency table for this? I tried print(table(myMatrix)).
It is printing
1 2 3 4 5 6 7 8
60 277 31 32 83 39 24 44
and what I want is it to print each country value (1-8) and how many 1s and 2s there are for each of these 8 values.
I guess there is a duplicate somewhere.
# Turn your matrix into a data.frame, easier to manipulate and to create labels
myDataFrame <- as.data.frame(myMatrix)
# Add factors to coutry, from 1 to 8. This will add missing levels to the final result
myDataFrame$country <- factor(myDataFrame$country, 1:8)
# Table
table(myDataFrame)
# cLabel
# country 1 2
# 1 0 0
# 2 0 0
# 3 1 1
# 4 0 1
# 5 1 3
# 6 0 2
# 7 0 0
# 8 1 4

reordering rows of matrix by data sequence inside [duplicate]

Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.

Get contagion chain from adjacency matrix, r, igraph

Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.

melt the lower half matrix in R

How can I melt a lower half triangle plus diagonal matrix ?
11 NA NA NA NA
12 22 NA NA NA
13 23 33 NA NA
14 24 34 44 NA
15 25 35 45 55
A <- t(matrix (c(11, NA, NA, NA, NA, 12, 22, NA, NA, NA,
13, 23, 33, NA, NA, 14, 24, 34, 44, NA,15, 25,
35, 45, 55), ncol = 5))
> A
[,1] [,2] [,3] [,4] [,5]
[1,] 11 NA NA NA NA
[2,] 12 22 NA NA NA
[3,] 13 23 33 NA NA
[4,] 14 24 34 44 NA
[5,] 15 25 35 45 55
To data.frame in row and col (preserving the following order)
col row value
1 1 11
1 2 12
1 3 13
1 4 14
1 5 15
2 2 22
2 3 23
2 4 24
2 5 25
3 3 33
3 4 34
3 5 35
4 4 44
4 5 45
5 5 55
If you want the indices as columns as well, this should work:
m <- matrix(1:25,5,5)
m[upper.tri(m)] <- NA
m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 NA NA NA NA
[2,] 2 7 NA NA NA
[3,] 3 8 13 NA NA
[4,] 4 9 14 19 NA
[5,] 5 10 15 20 25
cbind(which(!is.na(m),arr.ind = TRUE),na.omit(as.vector(m)))
row col
[1,] 1 1 1
[2,] 2 1 2
[3,] 3 1 3
[4,] 4 1 4
[5,] 5 1 5
[6,] 2 2 7
[7,] 3 2 8
[8,] 4 2 9
[9,] 5 2 10
[10,] 3 3 13
[11,] 4 3 14
[12,] 5 3 15
[13,] 4 4 19
[14,] 5 4 20
[15,] 5 5 25
I guess I'll explain this a bit. I'm using three "tricks":
The arr.ind argument to which to get the indices
The very useful na.omit function to avoid some extra typing
The fact that R stores matrices in column major form, hence as.vector returns the values in the right order.
My one liner.
reshape2::melt(A, varnames = c('row', 'col'), na.rm = TRUE)
Here's my first solution:
test <- rbind(c(11,NA,NA,NA,NA),
c(12,22,NA,NA,NA),
c(13,23,33,NA,NA),
c(14,24,34,44,NA),
c(15,25,35,45,55)) ## Load the matrix
test2 <- as.vector(test) ## "melt" it into a vector
test <- cbind( test2[!is.na(test2)] ) ## get rid of NAs, cbind it into a column
Results are:
> test
[,1]
[1,] 11
[2,] 12
[3,] 13
[4,] 14
[5,] 15
[6,] 22
[7,] 23
[8,] 24
[9,] 25
[10,] 33
[11,] 34
[12,] 35
[13,] 44
[14,] 45
[15,] 55
Alternatively, you can use the matrix command:
test <- rbind(c(11,NA,NA,NA,NA),
c(12,22,NA,NA,NA),
c(13,23,33,NA,NA),
c(14,24,34,44,NA),
c(15,25,35,45,55)) ## Load the matrix
test2 <- matrix(test, ncol=1)
test <- cbind( test2[!is.na(test2), ] )
## same as above, except now explicitly noting rows to replace.
Here is my attempt:
# enter the data
df <- c(11,12,13,14,15,NA,22,23,24,25,NA,NA,33,34,35,NA,NA,NA,44,45,NA,NA,NA,NA,55)
dim(df) <- c(5,5)
df
# make new data frame with rows and column indicators
melteddf <- data.frame(
value=df[lower.tri(df,diag=T)],
col=rep(1:ncol(df),ncol(df):1),
row=unlist(sapply(1:nrow(df),function(x) x:nrow(df)))
)
I wish I knew about the arr.ind part of cbind which before now though.
Here is a method using arrayInd which is basically the same as #joran's but might be useful in other settings:
na.omit( data.frame(arrayInd(1:prod(dim(A)), dim(A)), value=c(A)) )
X1 X2 value
1 1 1 11
2 2 1 12
3 3 1 13
4 4 1 14
5 5 1 15
7 2 2 22
8 3 2 23
9 4 2 24
10 5 2 25
13 3 3 33
14 4 3 34
15 5 3 35
19 4 4 44
20 5 4 45
25 5 5 55

Resources