Related
This question already has an answer here:
Move NA to the start of each column in a matrix
(1 answer)
Closed 2 years ago.
I have a bunch of columns which all start on the same row but I would rather them all end on the same row. Here is a simplified example
A <- c(2,7,3,5,5,9,8,1,NA,NA)
B <- c(NA,5,2,1,6,4,6,7,NA,NA)
C <- c(NA,NA,NA,NA,3,6,7,1,5,6)
Start <- cbind(A,B,C)
Which gives:
A B C
[1,] 2 NA NA
[2,] 7 5 NA
[3,] 3 2 NA
[4,] 5 1 NA
[5,] 5 6 3
[6,] 9 4 6
[7,] 8 6 7
[8,] 1 7 1
[9,] NA NA 5
[10,] NA NA 6
But I want to manipulate this so it is output like this:
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
Couldn't really find a solution on this site. Thanks for any help.
You can try:
apply(Start, 2, function(x) rev(`length<-`(na.omit(rev(x)), nrow(Start))))
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
We can try apply + is.na
apply(Start,2,function(x) c(x[is.na(x)],x[!is.na(x)]))
or
apply(Start,2,function(x) do.call(c,rev(split(x,is.na(x)))))
such that
A B C
[1,] NA NA NA
[2,] NA NA NA
[3,] 2 NA NA
[4,] 7 5 NA
[5,] 3 2 3
[6,] 5 1 6
[7,] 5 6 7
[8,] 9 4 1
[9,] 8 6 5
[10,] 1 7 6
There is a sort parameter for this:
A <- c(2,7,3,5,5,9,8,1,NA,NA)
B <- c(NA,5,2,1,6,4,6,7,NA,NA)
C <- c(NA,NA,NA,NA,3,6,7,1,5,6)
Start <- as.data.frame(cbind(A,B,C) ) # added "as.data.frame" here ..
do.call(cbind, lapply(Start, sort, na.last = FALSE))
Or:
do.call(cbind, lapply(Start, function(x) {
res <- sort(x, na.last = FALSE)
res[!is.na(res)] <- x[!is.na(x)]
res
}))
# A B C
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] 2 NA NA
# [4,] 7 5 NA
# [5,] 3 2 3
# [6,] 5 1 6
# [7,] 5 6 7
# [8,] 9 4 1
# [9,] 8 6 5
#[10,] 1 7 6
This question already has answers here:
Merging two columns into one in R [duplicate]
(7 answers)
Closed 3 years ago.
I have several columns with non-overlapping data in it:
a <- c(rep(1, 10), rep(NA, 20))
b <- c(rep(NA, 10), rep(2, 10), rep(NA, 10))
c <- c(rep(NA, 20), rep(3, 10))
data <- cbind(a, b, c)
an output:
a b c
[1,] 1 NA NA
[2,] 1 NA NA
[3,] 1 NA NA
[4,] 1 NA NA
[5,] 1 NA NA
[6,] 1 NA NA
[7,] 1 NA NA
[8,] 1 NA NA
[9,] 1 NA NA
[10,] 1 NA NA
[11,] NA 2 NA
[12,] NA 2 NA
[13,] NA 2 NA
[14,] NA 2 NA
[15,] NA 2 NA
[16,] NA 2 NA
[17,] NA 2 NA
[18,] NA 2 NA
[19,] NA 2 NA
[20,] NA 2 NA
[21,] NA NA 3
[22,] NA NA 3
[23,] NA NA 3
[24,] NA NA 3
[25,] NA NA 3
[26,] NA NA 3
[27,] NA NA 3
[28,] NA NA 3
[29,] NA NA 3
[30,] NA NA 3
What is the way to collapse these N columns (there are much more than 3) into one using dplyr, so
result <- c(rep(1, 10), rep(2, 10), rep(3, 10))
Of course in the real data instead of (1, 2, 3) the actual data is something completely different and its only common property is that it is not NA.
In base R,
For numeric, you could use rowSums
rowSums(data, na.rm = TRUE)
For non-numeric, max.col can be used to identify the columns with non NA values
data[cbind(1:NROW(data), max.col(!is.na(data)))]
I am trying to loop over the columns of a matrix and change certain predefined sequences within the colomns, which are available in form of vectors.
Let's say I have the following matrix:
m2 <- matrix(sample(1:36),9,4)
[,1] [,2] [,3] [,4]
[1,] 11 6 1 14
[2,] 22 16 27 3
[3,] 34 10 23 32
[4,] 21 19 31 35
[5,] 17 9 2 4
[6,] 28 18 29 5
[7,] 20 30 13 36
[8,] 26 33 24 15
[9,] 8 12 25 7
As an example my vector of sequence starts is a and my vector of sequence ends is b. Thus the first sequence to delete in all columns is a[1] to b[1], the 2nd a[2] to b[2] and so on.
My testing code is as follows:
testing <- function(x){
apply(x,2, function(y){
a <- c(1,5)
b <- c(2,8)
mapply(function(y){
y[a:b] <- NA; y
},a,b)
})
}
Expected outcome:
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] 34 10 23 32
[4,] 21 19 31 35
[5,] NA NA NA NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
[9,] 8 12 25 7
Actual result:
Error in (function (y) : unused argument (dots[[2]][[1]])
What is wrong in the above code? I know I could just set the rows to NA, but I am trying to get the above output by using nested apply functions to learn more about them.
We get the sequence of corresponding elements of 'a', 'b' using Map, unlist to create a vector and assign the rows of 'm2' to NA based on that.
m2[unlist(Map(":", a, b)),] <- NA
m2
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] 34 10 23 32
# [4,] 21 19 31 35
# [5,] NA NA NA NA
# [6,] NA NA NA NA
# [7,] NA NA NA NA
# [8,] NA NA NA NA
# [9,] 8 12 25 7
Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.
Q.I have a erdos.reyni graph. I infect a vertex and want to see what sequence of vertices the disease would follow? igraph has helful functions like get.adjacency(), neighbors().
Details. This is the adjacency matrix with vertex names instead of 0,1 flags and i'm trying to get the contagion chain out of it. Like the flow/sequence of an epidemic through a graph if a certain vertex is infected. Let's not worry about infection probabilities here (assume all vertices hit are infected with probability 1).
So suppose I hit vertex 1 (which is row 1 here). We see that it has outgoing links to vertex 4,5,18,22,23,24,25. So then the next vertices will be those connected to 4,5,18...25 i.e. those values in row4, row5, row18,... row25. Then, according to the model, the disease will travel through these and so forth.
I understand that I can pass a string to order the matrix rows. My problem is, I cannot figure out how to generate that sequence.
The matrix looks like this.
> channel
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 4 5 18 22 23 24 25 NA
[2,] 6 10 11 18 25 NA NA NA
[3,] 7 11 18 20 NA NA NA NA
[4,] 24 NA NA NA NA NA NA NA
[5,] 1 3 9 13 14 NA NA NA
[6,] 3 8 9 14 19 23 NA NA
[7,] 3 4 8 15 20 22 NA NA
[8,] 2 3 25 NA NA NA NA NA
[9,] 3 4 11 13 20 NA NA NA
[10,] 4 5 8 15 19 20 21 22
[11,] 3 13 15 18 19 23 NA NA
[12,] 11 13 16 NA NA NA NA NA
[13,] 4 6 14 15 16 17 19 21
[14,] 2 6 13 NA NA NA NA NA
[15,] 3 17 20 NA NA NA NA NA
[16,] 6 15 18 23 NA NA NA NA
[17,] 2 25 NA NA NA NA NA NA
[18,] 2 5 NA NA NA NA NA NA
[19,] 3 11 NA NA NA NA NA NA
[20,] 1 4 7 10 12 21 22 25
[21,] 2 4 6 13 14 16 18 NA
[22,] 1 3 4 15 23 NA NA NA
[23,] 1 16 24 NA NA NA NA NA
[24,] 7 8 19 20 22 NA NA NA
[25,] 7 12 13 17 NA NA NA NA
I want to reorder this matrix based on a selection criteria as follows:
R would be most helpful (but i'm interested in the algo so any python,ruby,etc.will be great).The resulting vector will have length of 115 (8x25=200 - 85 NAs=115). and would look like this. Which is basically how the disease would spread if vertex 1, becomes infected.
4,5,18,22,23,24,25,24,1,3,9,13,14,2,5,1,3,4,15,23,1,16,24,7,8,19,20,22,7,12,13,17,7,8,19,20,22, 4,5,18,22,23,24,25,7,11,18,20...
What I know so far:
1. R has a package **igraph** which lets me calculate neighbors(graph, vertex, "out")
2. The same package can also generate get.adjlist(graph...), get.adjacency
Finding a "contagion chain" like this is equivalent to a breadth-first search through the graph, e.g.:
library(igraph)
set.seed(50)
g = erdos.renyi.game(20, 0.1)
plot(g)
order = graph.bfs(g, root=14, order=TRUE, unreachable=FALSE)$order
Output:
> order
[1] 14 1 2 11 16 18 4 19 12 17 20 7 8 15 5 13 9 NaN NaN NaN
It's not clear how you define the ordering of the rows, so... just a few hints:
You can select a permutation/combination of rows by passing an index vector:
> (m <- matrix(data=1:9, nrow=3))
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m[c(2,3,1),]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 1 4 7
The function t() transposes a matrix.
The matrix is stored in columns-first (or column-major) order:
> as.vector(m)
[1] 1 2 3 4 5 6 7 8 9
NA values can be removed by subsetting:
> qq <- c(1,2,NA,5,7,NA,3,NA,NA)
> qq[!is.na(qq)]
[1] 1 2 5 7 3
Also, graph algorithms are provided by Bioconductor's graph or CRAN's igraph packages.