How to get a specific column from a matrix in r? - r

I have a matrix as following, how can I extract the desired column with [?
MX <- matrix(101:112,ncol=3)
MX[,2]
# [1] 105 106 107 108
`[`(MX, c(1:4,2))
# [1] 101 102 103 104 102
Obviously, it does not extract 2nd column as intuitive guess, but honestly gets the 2nd element of all.
More like I am asking how to express MX[,2] with [.
Please advise, Thanks

Keep the row index as blank
`[`(MX, ,2)
#[1] 105 106 107 108
or if we need to extract selected rows (1:4) of a specific column (2), specify the row, column index without concatenating. c will turn the row and column index to a single vector instead of two
`[`(MX, 1:4, 2)
#[1] 105 106 107 108

Related

subset of dataframe in R using a predefined list of names

I have a list of gene names called "COMBO_mk_plt_genes_labels" and a dataframe of marker genes called "Marker_genes_41POS_12_libraries_test_1" containing genes and fold changes.
I want to extract the names of COMBO_mk_plt_genes_labels.
I know that the which() function in R would get the positions of the genes. See my example below. How do I extract the names and not only the position?
print(head(Marker_genes_41POS_12_libraries_test_1))
p_val avg_logFC pct.1 pct.2 p_val_adj
HBD 6.610971e-108 3.3357135 0.930 0.080 2.419682e-103
GP1BB 1.332211e-91 2.5397301 0.825 0.047 4.876024e-87
CMTM5 1.938091e-63 2.0580724 0.605 0.005 7.093606e-59
SH3BGRL3 1.067771e-60 1.3750032 0.975 0.592 3.908149e-56
PF4 1.899932e-60 3.0111590 0.371 0.000 6.953941e-56
FTH1 4.242081e-58 0.8947325 0.996 0.905 1.552644e-53
COMBO_mk_plt_genes=read.csv(file = "combined_Mk_Plt_genes_list.csv", row.names = ,1)
COMBO_mk_plt_genes_labels=COMBO_mk_plt_genes[,1]
print(head(COMBO_mk_plt_genes_labels))
[1] "CMTM5" "GP9" "CLEC1B" "LTBP1" "C12orf39" "CAMK1"
PLT_genes_in_dataframe= which(rownames(Marker_genes_41POS_12_libraries_test_1) %in% COMBO_mk_plt_genes_labels)
print(PLT_genes_in_dataframe)
[1] 2 3 5 8 11 12 13 20 22 23 24 27 32 38 39 42
[17] 48 60 61 66 68 75 77 92 93 108 112 145 158 175 188 196
[33] 203 214 236 253 261 307 308 1004 1017
I want the names of the elements not the positions. Any advice is appreciated.
You can use the base intersect():
intersect(rownames(Marker_genes_41POS_12_libraries_test_1), COMBO_mk_plt_genes_labels)
intersect() outputs the items that match between the two sequences of items.
Run ?intersect() or ?base::intersect() for more information.
Alternative solution: Getting element names with your which() approach
You can still use which() to find the items or element names. Knowing that your which() function provides the index numbers at which rownames(Marker_genes_41POS_12_libraries_test_1) matches COMBO_mk_plt_genes_labels in rownames(Marker_genes_41POS_12_libraries_test_1), you can use those index numbers to call the element names in rownames(Marker_genes_41POS_12_libraries_test_1) that matched.
rownames(Marker_genes_41POS_12_libraries_test_1)[which(rownames(Marker_genes_41POS_12_libraries_test_1) %in% COMBO_mk_plt_genes_labels)]
# or in short
rownames(Marker_genes_41POS_12_libraries_test_1)[PLT_genes_in_dataframe]
intersect(), though, is a simpler approach.
However, there is one difference you need to be aware of and that is with duplicated items. If the rownames(...) (let's call it x) has duplicates that match with items in the second sequence of items y, intersect(x, y) will not provide you any duplicates. In contrast, the x[which(x %in% y)] (i.e., the which() approach) will provide you duplicated x where the match with y in x is TRUE. Switch x and y and you can get duplicated y names, too, using y[which(y %in% x)]. You can use this for something like tallying the number of times that there was a match.

reading dataframe or matrix value just similar to process pipeline in parallel system

I have a matrix or data table as below:
which looks like
time node1 node2 node3
1 100 200 300
2 101 245 329
3 90 245 350
4 129 320 290
5 79 270 320
I want to read this matrix as:
In first run – 1,101,245,290 and assign to some vector
In second run – 2,90,320,320 and assign to some vector.
In third run—3,129,270 and assign to some vector.
So that in later stage I can use this vector for mathematical calculation.
process is similar to pipeline where every stage gives output per clock tick.
This will get you most of the way there. You can index a data.frame using a matrix. The first column indicates the row and the second the column. It looks like you want to create diagonal vectors
vecs <- lapply(1:3, function(i) df[cbind(pmin(i + 0:3, nrow(df)), 1:4)])
> vecs
[[1]]
[1] 1 101 245 290
[[2]]
[1] 2 90 320 320
[[3]]
[1] 3 129 270 320

R: Select multiple values from vector of sequences

In R I'm trying to figure out how to select multiple values from a predefined vector of sequences (e.g. indices = c(1:3, 4:6, 10:12, ...)). In other words, if I want a new vector with the 3rd, 5th, and 7th entries in "indices", what syntax should I use to get back a vector with just those sequences intact, e.g. c(10:12, ...)?
If I understand correctly, you want the 3rd, 5th, and 7th entry in c(1:3, 4:6, 10:12, ...), which means you want extract specific sets of indices from a vector.
When you do something like c(1:3, 4:6, ...), the resulting vector isn't what it sounds like you want. Instead, use list(1:3, 4:6, ...). Then you can do this:
indices <- list(1:3, 4:6, 10:12, 14:16, 18:20)
x <- rnorm(100)
x[c(indices[[3]], indices[[5]])]
This is equivalent to:
x[c(10:12, 18:20)]
That is in turn equivalent to:
x[c(10, 11, 12, 18, 19, 20)]
Please let me know if I've misinterpreted your question.
What you are looking for is how to subset data. Most commonly it is done using square bracket notation:
sample data:
my_vector <- c(100:120)
my_vector
# 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
values you want taken out:
indices <- c(1:3, 4:6, 10:12)
indices
# 1 2 3 4 5 6 10 11 12
subsetting using bracket notation
my_vector[indices]
# 100 101 102 103 104 105 109 110 111
there is also a function called subset can can do this as well.

data.matrix() when character involved

In order to calculate the highest contribution of a row per ID I have a beautiful script which works when the IDs are a numeric. Today however I found out that it is also possible that IDs can have characters (for instance ABC10101). For the function to work, the dataset is converted to a matrix. However data.matrix(df) does not support characters. Can the code be altered in order for the function to work with all kinds of IDs (character, numeric, etc.)? Currently I wrote a quick workaround which converts IDs to numeric when ID=character, but that will slow the process down for large datasets.
Example with code (function: extract the first entry with the highest contribution, so if 2 entries have the same contribution it selects the first):
Note: in this example ID is interpreted as a factor and data.matrix() converts it to a numeric value. In the code below the type of the ID column should be character and the output should be as shown at the bottom. Order IDs must remain the same.
tc <- textConnection('
ID contribution uniqID
ABCUD022221 40 101
ABCUD022221 40 102
ABCUD022222 20 103
ABCUD022222 10 104
ABCUD022222 90 105
ABCUD022223 75 106
ABCUD022223 15 107
ABCUD022223 10 108 ')
df <- read.table(tc,header=TRUE)
#Function that needs to be altered
uniqueMaxContr <- function(m, ID = 1, contribution = 2) {
t(
vapply(
split(1:nrow(m), m[,ID]),
function(i, x, contribution) x[i, , drop=FALSE]
[which.max(x[i,contribution]),], m[1,], x=m, contribution=contribution
)
)
}
df<-data.matrix(df) #only works when ID is numeric
highestdf<-uniqueMaxContr(df)
highestdf<-as.data.frame(highestdf)
In this case the outcome should be:
ID contribution uniqID
ABCUD022221 40 101
ABCUD022222 90 105
ABCUD022223 75 106
Others might be able to make it more concise, but this is my attempt at a data.table solution:
tc <- textConnection('
ID contribution uniqID
ABCUD022221 40 101
ABCUD022221 40 102
ABCUD022222 20 103
ABCUD022222 10 104
ABCUD022222 90 105
ABCUD022223 75 106
ABCUD022223 15 107
ABCUD022223 10 108 ')
df <- read.table(tc,header=TRUE)
library(data.table)
dt <- as.data.table(df)
setkey(dt,uniqID)
dt2 <- dt[,list(contribution=max(contribution)),by=ID]
setkeyv(dt2,c("ID","contribution"))
setkeyv(dt,c("ID","contribution"))
dt[dt2,mult="first"]
## ID contribution uniqID
## [1,] ABCUD022221 40 101
## [2,] ABCUD022222 90 105
## [3,] ABCUD022223 75 106
EDIT -- more concise solution
You can use .SD which is the subset of the data.table for the grouping, and then use which.max to extract a single row.
in one line
dt[,.SD[which.max(contribution)],by=ID]
## ID contribution uniqID
## [1,] ABCUD022221 40 101
## [2,] ABCUD022222 90 105
## [3,] ABCUD022223 75 106

R : how to Detect Pattern in Matrix By Row

I have a big matrix with 4 columns, containing normalized values (by column, mean ~ 0 and standard deviation = 1)
I would like to see if there is a pattern in the matrix, and if yes I would like to cluster rows by pattern, by pattern I mean values in a given row example
for row N
if value in column 1 < column 2 < column 3 < column 4 then it is let's say a pattern 1
Basically there is 4^4 = 256 possible patterns (in theory)
Is there a way in R to do this ?
Thanks in advance
Rad
Yes. (Although the number of distinct permutations is only 24 = 4*3*2. After one value is chosen, there are only three possible second values, and after the second is specified there are only two more orderings left.) The order function applied to each row should give the desired 1,2,3, 4 permutations:
mtx <- matrix(rnorm(10000), ncol=4)
res <- apply(mtx, 1, function(x) paste( order(x), collapse=".") )
> table(res)[1:10]
> table(res)
res
1.2.3.4 1.2.4.3 1.3.2.4 1.3.4.2 1.4.2.3 1.4.3.2
98 112 95 120 114 118
2.1.3.4 2.1.4.3 2.3.1.4 2.3.4.1 2.4.1.3 2.4.3.1
101 114 105 102 104 122
3.1.2.4 3.1.4.2 3.2.1.4 3.2.4.1 3.4.1.2 3.4.2.1
105 82 107 90 97 86
4.1.2.3 4.1.3.2 4.2.1.3 4.2.3.1 4.3.1.2 4.3.2.1
99 93 100 108 118 110

Resources