Matching and replacing with for loops - r

Having issues with a for loop.
I am trying to take elements a b c d from each pathway (pathway matrix) and match them to expression data (expression matrix) and put them into a new matrix which look similar to pathway matrix but now contains the elements from expression matrix.
I am trying to acheve this final matrix outcome.
a <- c("pathway","1","4","7","pathway-2","1","e","g","pathway-3","4","g","h")
pathway<-matrix(a,3,4, byrow=T)
The code will be easier to understand than my wording I hope.
a <- c("pathway","b","c","d","pathway-2","b","e","g","pathway-3","c","g","h")
pathway<-matrix(a,3,4, byrow=T)
b <- c("b",1,"c",4,"d",7)
expression<-matrix(b,3,2, byrow=T)
new<-matrix("a",3,4)
new[1:3,1]<-pathway[,1]
for (x in 1:nrow(expression)){
for (y in 1:ncol(pathway)){
if(expression[x,1]==pathway[x,y]){
new[x,y]<-expression[x,2]
}
}
}

Here is one way to do it. We match each column of pathway[,-1] with the expression[,1] matrix, and use the resulting matrix as index for the values from expression[,2]. The ones not found return NA so we index them and replace them from the original matrix. Then cbind as usual to get desired matrix.
new_m <- apply(pathway[, -1], 2, function(i) expression[,2][match(i, expression[,1])])
new_m[which(is.na(new_m))] <- pathway[,-1][which(is.na(new_m))]
cbind(pathway[,1], new_m)
# [,1] [,2] [,3] [,4]
#[1,] "pathway" "1" "4" "7"
#[2,] "pathway-2" "1" "e" "g"
#[3,] "pathway-3" "4" "g" "h"

Related

R Populate multi-dimensional array with input from alternating vectors

I'm trying to populate a multi-dimensional array with two vectors of the same length. The input data should alternate between the vectors, so that the first input is the first object of the first vector, the second input is the first object of the second vector and so on.
I searched for similar problems on this site and found the function rbind(), however, this will not work as soon as my third dimension is unequal to 1.
In short, I want to achieve this:
a <- 1:6
b <- c("a","b","c","d","e","f")
# output array
, , 1
[,1] [,2]
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
, , 2
[,1] [,2]
[1,] "4" "d"
[2,] "5" "e"
[3,] "6" "f"
I have a working solution below using three for-loops, but this seems overly complicated.
a <- 1:6
b <- c("a","b","c","d","e","f")
len <- prod(length(a)+length(b))
myarray <- array(rep(F,len),dim=c(3,2,2))
counter <- 1
for (n in 1:dim(myarray)[3]) { # n 2
for (r in 1:dim(myarray)[1]) { # rows 3
for (c in 1:dim(myarray)[2]) { # columns 2
if (c %% 2 != 0) {
myarray[r,c,n] <- a[counter]
} else {
myarray[r,c,n] <- b[counter]
}
}
counter <- counter + 1
}
}
Is there an easier approach?
(I'm sure I'm missing something very simple here, but I'm new to R and can't figure it out myself)
Thank you for reading!
[EDIT]
The code should be applyable to a data set with any vector length and any dimension dim = c(x,y,z).
Example data can be found on Dryad Database https://doi.org/10.5061/dryad.mp713, "Table 1 Arctic char landmarks", which contains 13 pairs of x-y-coordinates from 121 individuals of arctic char fish (dim=c(13,2,121)).
Here is my solution for the problem with dim = c(13,2,121):
M <- cbind(a, b)
array(sapply(seq(1, length(a), 13), function(i) M[i:(i+12),]), c(13,2,121))
Do not forget to store the result Mneu <- ...
For your small example:
M <- cbind(a, b);
array(sapply(seq(1, length(a), 3), function(i) M[i:(i+2),]), c(3,2,2))
Form an array and then permute the dimensions:
aperm(array(cbind(a, b), c(3, 2, 2)), c(1, 3:2))
giving:
, , 1
[,1] [,2]
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
, , 2
[,1] [,2]
[1,] "4" "d"
[2,] "5" "e"
[3,] "6" "f"
Note
We can generalize the example slightly:
n <- 6 # must be 26 or less so that we can use letters below
a <- 1:n
b <- head(letters, n)
aperm(array(cbind(a, b), c(n/2,2,2)), c(1, 3:2))

Paste together two data tables in R [duplicate]

I want to paste cells of matrix together, But when I do paste(),It returns a vector. Is there a direct function for same in R?
mat <- matrix(1:4,2,2)
paste(mat,mat,sep=",")
I want the output as
[,1] [,2]
[1,] 1,1 2,2
[2,] 3,3 4,4
A matrix in R is just a vector with an attribute specifying the dimensions. When you paste them together you are simply losing the dimension attribute.
So,
matrix(paste(mat,mat,sep=","),2,2)
Or, e.g.
mat1 <- paste(mat,mat,sep=",")
> mat1
[1] "1,1" "2,2" "3,3" "4,4"
> dim(mat1) <- c(2,2)
> mat1
[,1] [,2]
[1,] "1,1" "3,3"
[2,] "2,2" "4,4"
Here's just one example of how you might write a simple function to do this:
paste_matrix <- function(...,sep = " ",collapse = NULL){
n <- max(sapply(list(...),nrow))
p <- max(sapply(list(...),ncol))
matrix(paste(...,sep = sep,collapse = collapse),n,p)
}
...but the specific function you want will depend on how you want it to handle more than two matrices, matrices of different dimensions or possibly inputs that are totally unacceptable (random objects, NULL, etc.).
This particular function recycles the vector and outputs a matrix with the dimension matching the largest of the various inputs.
Another approach to the Joran's one is to use [] instead of reconstructing a matrix. In that way you can also keep the colnames for example:
truc <- matrix(c(1:3, LETTERS[3:1]), ncol=2)
colnames(truc) <- c("A", "B")
truc[] <- paste(truc, truc, sep=",")
truc
# A B
# [1,] "1,1" "C,C"
# [2,] "2,2" "B,B"
# [3,] "3,3" "A,A"
Or use sprintf withdim<-
`dim<-`(sprintf('%d,%d', mat, mat), dim(mat))
# [,1] [,2]
#[1,] "1,1" "3,3"
#[2,] "2,2" "4,4"
The ascii library has a function paste.matrix for element-wise paste across matrices. The output is the transpose to the desired outcome, but that's easy to address with t().
library(ascii)
mat <- matrix(1:4,2,2)
t(paste.matrix(mat,mat,sep=","))
[,1] [,2]
[1,] "1,1" "2,2"
[2,] "3,3" "4,4"

Find the minimum hamming distance between a string and a long vector of strings (fast)

I need to calculate the hamming distance between an input string and a large string dataset. (All strings in the dataset have the same length of the input string.)
For example, if
input <- "YNYYEY"
dataset <- c("YNYYEE", "YNYYYY", "YNENEN", "YNYYEY")
the hamming distance between input and each string in dataset is 1, 1, 3, 0 so the minimum is 0. I have written a function to calculate the hamming distance between two strings:
HD <- function(str1, str2){
str1 <- as.character(str1)
str2 <- as.character(str2)
length.str1 <- nchar(str1)
length.str2 <- nchar(str2)
string.temp1 <- c()
for (i in 1:length.str1){
string.temp1[i] = substr(str1, start=i, stop=i)
}
string.temp2 <- c()
for (i in 1:length.str2){
string.temp2[i] = substr(str2, start=i, stop=i)
}
return(sum(string.temp1 != string.temp2))
}
But the dataset is too big so I need to speed it up, do you have any idea that I can do it quickly? Thank you for your help.
At R level you can use strsplit, cbind, !=, colSums and min. They are all "vectorized".
a <- "YNYYEY"
b <- c("YNYYEE", "YNYYYY", "YNENEN", "YNYYEY")
A <- strsplit(a, split = "")[[1]]
#[1] "Y" "N" "Y" "Y" "E" "Y"
B <- do.call("cbind", strsplit(b, split = ""))
# [,1] [,2] [,3] [,4]
#[1,] "Y" "Y" "Y" "Y"
#[2,] "N" "N" "N" "N"
#[3,] "Y" "Y" "E" "Y"
#[4,] "Y" "Y" "N" "Y"
#[5,] "E" "Y" "E" "E"
#[6,] "E" "Y" "N" "Y"
D <- colSums(A != B)
#[1] 1 1 3 0
min(D)
#[1] 0
This kind of "vectorization" creates many temporary matrices / vectors and uses plenty of RAM. But hopefully it is worthwhile.
At C / C++ level you can do much much better (see a case study at here), but I am not keen on writing C / C++ code today.
I come across the stringdist package (there is even a stringdist tag). The function stringdist relies on a workhorse routine stringdist:::do_dist, which is written in C. It spares my effort.
library(stringdist)
d <- stringdist(a, b, method = "hamming")
#[1] 1 1 3 0
min(d)
#[1] 0
stringdist() runs almost ten times slower than colSum().
That is really interesting. Probably its C code or R code is doing something else complicated.
You cannot improve it better than O(n) it means that you have to review all the dataset, and calculate the distance for each observation.
The only improvement can happen on your dataset, if you sort all the observations based on a given point. In this case you maybe easier to find a string in dataset (0 distance results). This is the only improvement which you may do.

R paste matrix cells together

I want to paste cells of matrix together, But when I do paste(),It returns a vector. Is there a direct function for same in R?
mat <- matrix(1:4,2,2)
paste(mat,mat,sep=",")
I want the output as
[,1] [,2]
[1,] 1,1 2,2
[2,] 3,3 4,4
A matrix in R is just a vector with an attribute specifying the dimensions. When you paste them together you are simply losing the dimension attribute.
So,
matrix(paste(mat,mat,sep=","),2,2)
Or, e.g.
mat1 <- paste(mat,mat,sep=",")
> mat1
[1] "1,1" "2,2" "3,3" "4,4"
> dim(mat1) <- c(2,2)
> mat1
[,1] [,2]
[1,] "1,1" "3,3"
[2,] "2,2" "4,4"
Here's just one example of how you might write a simple function to do this:
paste_matrix <- function(...,sep = " ",collapse = NULL){
n <- max(sapply(list(...),nrow))
p <- max(sapply(list(...),ncol))
matrix(paste(...,sep = sep,collapse = collapse),n,p)
}
...but the specific function you want will depend on how you want it to handle more than two matrices, matrices of different dimensions or possibly inputs that are totally unacceptable (random objects, NULL, etc.).
This particular function recycles the vector and outputs a matrix with the dimension matching the largest of the various inputs.
Another approach to the Joran's one is to use [] instead of reconstructing a matrix. In that way you can also keep the colnames for example:
truc <- matrix(c(1:3, LETTERS[3:1]), ncol=2)
colnames(truc) <- c("A", "B")
truc[] <- paste(truc, truc, sep=",")
truc
# A B
# [1,] "1,1" "C,C"
# [2,] "2,2" "B,B"
# [3,] "3,3" "A,A"
Or use sprintf withdim<-
`dim<-`(sprintf('%d,%d', mat, mat), dim(mat))
# [,1] [,2]
#[1,] "1,1" "3,3"
#[2,] "2,2" "4,4"
The ascii library has a function paste.matrix for element-wise paste across matrices. The output is the transpose to the desired outcome, but that's easy to address with t().
library(ascii)
mat <- matrix(1:4,2,2)
t(paste.matrix(mat,mat,sep=","))
[,1] [,2]
[1,] "1,1" "2,2"
[2,] "3,3" "4,4"

positions of non-NA cells in a matrix

Consider the following matrix,
m <- matrix(letters[c(1,2,NA,3,NA,4,5,6,7,8)], 2, byrow=TRUE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "a" "b" NA "c" NA
## [2,] "d" "e" "f" "g" "h"
I wish to obtain the column indices corresponding to all non-NA elements, merged with the NA elements immediately following:
result <- c(list(1), list(2:3), list(4,5),
list(1), list(2), list(3), list(4), list(5))
Any ideas?
The column (and row) indicies of non-NA elements can be obtained with
which(!is.na(m), TRUE)
A full answer:
Since you want to work row-wise, but R treats vector column-wise, it is easier to work on the transpose of m.
t_m <- t(m)
n_cols <- ncol(m)
We get the array indicies as mentioned above, which gives the start point of each list.
ind_non_na <- which(!is.na(t_m), TRUE)
Since we are working on the transpose, we want the row indices, and we need to deal with each column separately.
start_points <- split(ind_non_na[, 1], ind_non_na[, 2])
The length of each list is given by the difference between starting points, or the difference between the last point and the end of the row (+1). Then we just call seq to get a sequence.
unlist(
lapply(
start_points,
function(x)
{
len <- c(diff(x), n_cols - x[length(x)] + 1L)
mapply(seq, x, length.out = len, SIMPLIFY = FALSE)
}
),
recursive = FALSE
)
This will get you close:
cols <- col(m)
cbind(cols[which(is.na(m))-1],cols[is.na(m)])
[,1] [,2]
[1,] 2 3
[2,] 4 5

Resources