Avoid a for loop - r

I am trying to optimize an algorithm and I really want to avoid all my loops. Hence I am wondering if there is a way to avoid the following simple loop:
library(FNN)
data <- cbind(1:10, 1:10)
NN.index <- get.knn(data, 5)$nn.index
bc <- matrix(0, nrow(NN.index), max(NN.index))
for(i in 1:nrow(bc)){
bc[i,NN.index[i,]] <- 1
}
were bc is a matrix of zeros.

In R, if the bracket of a matrix M take a k-by-2 matrix 'I', then each row of the k-by-2 matrix I is recognized as the row and column index of M. For example
M = matrix(1:20, nrow =4, ncol = 3)
print(M)
I = rbind(c(1,2), c(4,2), c(3,3))
print(M[I])
In this case, M[1,2], M[4,2] and M[3,3] are extracted.
In your case, we can create row_index and col_index from NN.index as below, and then assign 1 to the corresponding entries.
bc <- matrix(0, nrow(NN.index), max(NN.index))
row_index <- rep(1:nrow(NN.index), times = ncol(NN.index))
col_index <- as.vector(NN.index)
bc[cbind(row_index, col_index)] <- 1
print(bc)

Related

Replacing values in the columns of a matrix according to a vector of indices?

I have a matrix of zeroes:
M <- matrix(0, nrow = 10, ncol = 5)
and a vector of indices
V <- c(1,5,3,2,3,4,1,3,2,4)
I want to replace the entries M[i,V[i]] by 1, i in 1:10. How can I do this without using brute force (for loop)? Below is the code to do so using brute force, which is not efficient in higher dimensions:
for(i in 1:10) M[i,V[i]] = 1
You can make a matrix from your vector V and use it directly, i.e.
M[matrix(c(seq_along(V), V), ncol = 2)] <- 1

Efficient way to find all combinations in a data frame in R

I am looking for a efficient way in R to derive possible combinations.
I have a data frame with 3 columns and on the basis first column contents I am calculating all the possible combinations.
df <- data.frame("H" = c("H1","H2","H3","H4"), "W1" = c(95, 0, 85 ,0) , "W2" = c(50, 85, 0,0))
df$H <- as.character.factor(df$H)
nH <- nrow(df)
nW <- 2
library(plyr)
library(gtools)
if(nW<=5){
# Find all possible combinations
mat1 <- matrix(nrow = 0, ncol = nH)
for(i in 1:nH){
# mat1 <- rbind.fill.matrix(mat1, combinations(nH,nH-(i-1),df$H))
mat1 <- rbind.fill.matrix(mat1, t(combn(df$H,nH-(i-1))))
}
df_comb <- data.frame(mat1)
}
View(df_comb)
df_comb gives correct output. Above code works good for small data sets but when the values for H column is more than 15 , R results into out of memory.
Looking for ways in which calculation of combinations in above scenario can be done efficiently in R till H1, H2 .... H49, H50.
EDIT:
Tried a different Approach, Now after certain number of possible combinations (in below case - 32767), applied random sampling to generate combinations using ratio method.
nH <- 26
nW <- 2
if(nW<=5){
# Find all possible combinations ~~~~~ Random Sampling
ncomb <- 0
for(i in 1:nH){
ncomb <- ncomb + choose(nH, nH-(i-1))
}
nmax <- 10000 # Total number of combinations cannot exceed 10000
mat1 <- matrix( nrow = 0, ncol = nH)
for(i in 1:nH){ # For each Group 26C1 26C2 26C3 ..... 26C25 26C26
ncombi <- choose(nH, nH-(i-1)) #For i = 1 , 26C25
ncombComputed <- ceiling(nmax/ncomb*choose(nH, nH-(i-1)))
if(ncomb <= 32767 ){ # This condition is independent of NMAX - For 15
#Combinations
print("sefirst")
final <- mat1
print(paste(nH," ",i))
abc <- combinations(nH,nH-(i-1),df$herbicide)
mat1 <- rbind.fill.matrix(mat1, combinations(nH,nH-(i-1),df$H))
}
else {
print(i)
print("second")
combi <- matrix( nrow = 0, ncol = nH-(i-1))
#random sampling
while(nrow(combi) < ncombComputed){
combi<- rbind(combi,sort(sample(df$herbicide,nH-(i-1))))
combi <- unique(combi)
}
mat1 <- rbind.fill.matrix(mat1, combi)
}
}
df_comb_New <- data.frame(mat1)
}
The above code gives the result but for 26 Entries its taking 36 seconds for 10000 Combinations.Now I am looking that is there a way to optimize the while loop so that execution becomes faster or any other way to achieve the same result in efficient manner.

R: applying function to matrix except individual cell(s)

Suppose I have a 10x10 matrix. How can I fill it with 0's while excluding certain individual cells (preferably in a single operation)?
blank <- matrix(NA,nrow=10,ncol=10)
for (i in 1:10) {for (j in 1:10) {blank[i,j] <- 0 }}
# except blank[2,5], blank[9,3], blank[1,4], to be left NA
Probably more efficient to rather declare the matrix as 0s and assign the NAs to the small number of exception cells:
blank <- matrix(0, nrow = 10, ncol = 10)
blank[2, 5] <- blank[9, 3] <- blank[1, 4] <- NA
Or, more programmably:
coords <- list(c(2, 5),
c(9, 3),
c(1, 4))
blank[do.call("rbind", coords)] <- NA
(the key being this part of ?"["):
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
If this is supposed to be a random assignment of NA to a zero matrix then this might suffice.
zero3NA <- matrix(0, 10, 10)
zero3NA[ cbind( sample(nrow(zero3NA), 3), sample(ncol(zero3NA), 3) ) ] <- NA

R: How to write a for loop that reads every two lines in a matrix?

I want to calculate correlation statistics using cor.test(). I have a data matrix where the two pairs to be tested are on consecutive lines (I have more than thousand pairs so I need to correct for that also later). I was thinking that I could loop through every two and two lines in the matrix and perform the test (i.e. first test correlation between row1 and row2, then row3 and row4, row5 and row6 etc.), but I don't know how to make this kind of loop.
This is how I do the test on a single pair:
d = read.table(file="cor-test-sample-data.txt", header=T, sep="\t", row.names = 1)
d = as.matrix(d)
cor.test(d[1,], d[2,], method = "spearman")
You could try
res <- lapply(split(seq_len(nrow(mat1)),(seq_len(nrow(mat1))-1)%/%2 +1),
function(i){m1 <- mat1[i,]
if(NROW(m1)==2){
cor.test(m1[1,], m1[2,], method="spearman")
}
else NA
})
To get the p-values
resP <- sapply(res, function(x) x$p.value)
indx <- t(`dim<-`(seq_len(nrow(mat1)), c(2, nrow(mat1)/2)))
names(resP) <- paste(indx[,1], indx[,2], sep="_")
resP
# 1_2 3_4 5_6 7_8 9_10 11_12 13_14
#0.89726818 0.45191660 0.14106085 0.82532260 0.54262680 0.25384239 0.89726815
# 15_16 17_18 19_20 21_22 23_24 25_26 27_28
#0.02270217 0.16840791 0.45563229 0.28533447 0.53088721 0.23453161 0.79235990
# 29_30 31_32
#0.01345768 0.01611903
Or using mapply (assuming that the rows are even)
ind <- seq(1, nrow(mat1), by=2) #similar to the one used by #CathG in for loop
mapply(function(i,j) cor.test(mat1[i,], mat1[j,],
method='spearman')$p.value , ind, ind+1)
data
set.seed(25)
mat1 <- matrix(sample(0:100, 20*32, replace=TRUE), ncol=20)
Try
d = matrix(rep(1:9, 3), ncol=3, byrow = T)
sapply(2*(1:(nrow(d)/2)), function(pair) unname(cor.test(d[pair-1,], d[pair,], method="spearman")$estimate))
pvalues<-c()
for (i in seq(1,nrow(d),by=2)) {
pvalues<-c(pvalues,cor.test(d[i,],d[i+1,],method="spearman")$p.value)
}
names(pvalues)<-paste(row.names(d)[seq(1,nrow(d),by=2)],row.names(d)[seq(2,nrow(d),by=2)],sep="_")

How do you find the sample sizes used in calculations on r?

I am running correlations between variables, some of which have missing data, so the sample size for each correlation are likely different. I tried print and summary, but neither of these shows me how big my n is for each correlation. This is a fairly simple problem that I cannot find the answer to anywhere.
like this..?
x <- c(1:100,NA)
length(x)
length(x[!is.na(x)])
you can also get the degrees of freedom like this...
y <- c(1:100,NA)
x <- c(1:100,NA)
cor.test(x,y)$parameter
But I think it would be best if you show the code for how your are estimating the correlation for exact help.
Here's an example of how to find the pairwise sample sizes among the columns of a matrix. If you want to apply it to (certain) numeric columns of a data frame, combine them accordingly, coerce the resulting object to matrix and apply the function.
# Example matrix:
xx <- rnorm(3000)
# Generate some NAs
vv <- sample(3000, 200)
xx[vv] <- NA
# reshape to a matrix
dd <- matrix(xx, ncol = 3)
# find the number of NAs per column
apply(dd, 2, function(x) sum(is.na(x)))
# tack on some column names
colnames(dd) <- paste0("x", seq(3))
# Function to find the number of pairwise complete observations
# among all pairs of columns in a matrix. It returns a data frame
# whose first two columns comprise all column pairs
pairwiseN <- function(mat)
{
u <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
h <- expand.grid(x = u, y = u)
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
h$n <- mapply(f, h[, 1], h[, 2])
h
}
# Call it
pairwiseN(dd)
The function can easily be improved; for example, you could set h <- expand.grid(x = u[-1], y = u[-length(u)]) to cut down on the number of calculations, you could return an n x n matrix instead of a three-column data frame, etc.
Here is a for-loop implementation of Dennis' function above to output an n x n matrix rather than have to pivot_wide() that result. On my databricks cluster it cut the compute time for 1865 row x 69 column matrix down from 2.5 - 3 minutes to 30-40 seconds.
Thanks for your answer Dennis, this helped me with my work.
pairwise_nxn <- function(mat)
{
cols <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
nn <- data.frame(matrix(nrow = length(cols), ncol = length(cols)))
rownames(nn) <- colnames(nn) <- cols
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
for (i in 1:nrow(nn))
for (j in 1:ncol(nn))
nn[i,j] <- f(rownames(nn)[i], colnames(nn)[j])
nn
}
If your variables are vectors named a and b, would something like sum(is.na(a) | is.na(b)) help you?

Resources