R: applying function to matrix except individual cell(s) - r

Suppose I have a 10x10 matrix. How can I fill it with 0's while excluding certain individual cells (preferably in a single operation)?
blank <- matrix(NA,nrow=10,ncol=10)
for (i in 1:10) {for (j in 1:10) {blank[i,j] <- 0 }}
# except blank[2,5], blank[9,3], blank[1,4], to be left NA

Probably more efficient to rather declare the matrix as 0s and assign the NAs to the small number of exception cells:
blank <- matrix(0, nrow = 10, ncol = 10)
blank[2, 5] <- blank[9, 3] <- blank[1, 4] <- NA
Or, more programmably:
coords <- list(c(2, 5),
c(9, 3),
c(1, 4))
blank[do.call("rbind", coords)] <- NA
(the key being this part of ?"["):
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.

If this is supposed to be a random assignment of NA to a zero matrix then this might suffice.
zero3NA <- matrix(0, 10, 10)
zero3NA[ cbind( sample(nrow(zero3NA), 3), sample(ncol(zero3NA), 3) ) ] <- NA

Related

Replaces all values with "NA" in R dataframe

I have a set of data
x <- seq(1, 10, by=1)
y <- seq(1, 10, by=1)
data <- expand.grid(x,y)
I would like to create a new set of data called NA_data and replaces all the 100 values in that data frame with "NA"
Just multiply with NA as any operation with NA results in NA and here we could either multiply or add (+) or subtract (-) or divide (/) and it still returns NA
NA_data <- data * NA
Or another option is
NA_data <- data[rep(nrow(data) + 1, nrow(data)),]
Or use the matrix way
NA_data <- as.data.frame(matrix(nrow = nrow(data),
ncol = ncol(data), dimnames = list(NULL, names(data)))

Replacing values in the columns of a matrix according to a vector of indices?

I have a matrix of zeroes:
M <- matrix(0, nrow = 10, ncol = 5)
and a vector of indices
V <- c(1,5,3,2,3,4,1,3,2,4)
I want to replace the entries M[i,V[i]] by 1, i in 1:10. How can I do this without using brute force (for loop)? Below is the code to do so using brute force, which is not efficient in higher dimensions:
for(i in 1:10) M[i,V[i]] = 1
You can make a matrix from your vector V and use it directly, i.e.
M[matrix(c(seq_along(V), V), ncol = 2)] <- 1

Create subset matrix according to criteria/ Extract key rows according to criteria

I want to subset the rows of my original matrix into two separate matrices.
I setup the problem as follows:
set.seed(2)
Mat1 <- data.frame(matrix(nrow = 4, ncol =10, data = rnorm(40,0,1)))
keep.rows = matrix(nrow =2, ncol =4)
keep.rows[,1] = c(1,2)
keep.rows[,2] = c(2,3)
keep.rows[,3] = c(2,3)
keep.rows[,4] = c(1,2)
Mat1
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0.9959846 -2.2079198 -0.3869496 -1.183606 1.959357077 1.0744594 -0.8621983 -0.4213736 0.4718595 1.2309537
2 -1.6957649 1.8221225 0.3866950 -1.358457 0.007645872 0.2605978 2.0480403 -0.3508344 1.3589398 1.1471368
3 -0.5333721 -0.6533934 1.6003909 -1.512671 -0.842615198 -0.3142720 0.9399201 -1.0273806 0.5641686 0.1065980
4 -1.3722695 -0.2846812 1.6811550 -1.253105 -0.601160105 -0.7496301 2.0086871 -0.2505191 0.4559801 -0.7833167
Mat 1 is my original matrix. Now from the Keep rows matrix, I want to create two output matrices. The first output matrix (Output1) should store all the row numbers specified in keep.row. The second output(Output2) matrix should store all remaining rows. In my actual application my matrices are very large and so cannot be sorted manually as i do here.
I need:
1) I need a function that does this simply over large matrices.
2) Ideally one where i can change the number of entries to "keep" each time. So in this case I store 3 entries. However, imagine if my keep.rows matrix was 2x2. In this case, I might want to store five entries each time.
Results should be of the form:
Output1 <- data.frame(matrix(nrow = 2, ncol =10))
Output1[1:2,1:3] <- Mat1[c(1,2), 1:3]
Output1[1:2,4:6] <- Mat1[c(2,3), 4:6]
Output1[1:2,7:9] <- Mat1[c(2,3), 7:9]
Output1[1:2,10] <- Mat1[c(1,2), 10]
Output2 <- data.frame(matrix(nrow = 2, ncol =10))
Output2[1:2,1:3] <- Mat1[c(3,4), 1:3]
Output2[1:2,4:6] <- Mat1[c(1,4), 4:6]
Output2[1:2,7:9] <- Mat1[c(1,4), 7:9]
Output2[1:2,10] <- Mat1[c(3,4), 10]
IMPORTANT: In the answer i need output 2 to be specified in a way that keeps all remaining rows. In my application my keep.row matrix is the same size. But Mat1 contains 1000 rows +
You can use sapply which iterates over the columns of Mat1 with seq_along(Mat1) and subset Mat1 using keep.rows. With cbind you get a matrix-like data.frame from the returned list of sapply. To get the remaining data you simply place a - before keep.rows.
Output1 <- do.call(cbind, sapply(seq_along(Mat1), function(i) Mat1[keep.rows[,(i+2) %/% 3], i, drop = FALSE], simplify = FALSE))
Output2 <- do.call(cbind, sapply(seq_along(Mat1), function(i) Mat1[-keep.rows[,(i+2) %/% 3], i, drop = FALSE], simplify = FALSE))

Avoid a for loop

I am trying to optimize an algorithm and I really want to avoid all my loops. Hence I am wondering if there is a way to avoid the following simple loop:
library(FNN)
data <- cbind(1:10, 1:10)
NN.index <- get.knn(data, 5)$nn.index
bc <- matrix(0, nrow(NN.index), max(NN.index))
for(i in 1:nrow(bc)){
bc[i,NN.index[i,]] <- 1
}
were bc is a matrix of zeros.
In R, if the bracket of a matrix M take a k-by-2 matrix 'I', then each row of the k-by-2 matrix I is recognized as the row and column index of M. For example
M = matrix(1:20, nrow =4, ncol = 3)
print(M)
I = rbind(c(1,2), c(4,2), c(3,3))
print(M[I])
In this case, M[1,2], M[4,2] and M[3,3] are extracted.
In your case, we can create row_index and col_index from NN.index as below, and then assign 1 to the corresponding entries.
bc <- matrix(0, nrow(NN.index), max(NN.index))
row_index <- rep(1:nrow(NN.index), times = ncol(NN.index))
col_index <- as.vector(NN.index)
bc[cbind(row_index, col_index)] <- 1
print(bc)

How do you find the sample sizes used in calculations on r?

I am running correlations between variables, some of which have missing data, so the sample size for each correlation are likely different. I tried print and summary, but neither of these shows me how big my n is for each correlation. This is a fairly simple problem that I cannot find the answer to anywhere.
like this..?
x <- c(1:100,NA)
length(x)
length(x[!is.na(x)])
you can also get the degrees of freedom like this...
y <- c(1:100,NA)
x <- c(1:100,NA)
cor.test(x,y)$parameter
But I think it would be best if you show the code for how your are estimating the correlation for exact help.
Here's an example of how to find the pairwise sample sizes among the columns of a matrix. If you want to apply it to (certain) numeric columns of a data frame, combine them accordingly, coerce the resulting object to matrix and apply the function.
# Example matrix:
xx <- rnorm(3000)
# Generate some NAs
vv <- sample(3000, 200)
xx[vv] <- NA
# reshape to a matrix
dd <- matrix(xx, ncol = 3)
# find the number of NAs per column
apply(dd, 2, function(x) sum(is.na(x)))
# tack on some column names
colnames(dd) <- paste0("x", seq(3))
# Function to find the number of pairwise complete observations
# among all pairs of columns in a matrix. It returns a data frame
# whose first two columns comprise all column pairs
pairwiseN <- function(mat)
{
u <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
h <- expand.grid(x = u, y = u)
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
h$n <- mapply(f, h[, 1], h[, 2])
h
}
# Call it
pairwiseN(dd)
The function can easily be improved; for example, you could set h <- expand.grid(x = u[-1], y = u[-length(u)]) to cut down on the number of calculations, you could return an n x n matrix instead of a three-column data frame, etc.
Here is a for-loop implementation of Dennis' function above to output an n x n matrix rather than have to pivot_wide() that result. On my databricks cluster it cut the compute time for 1865 row x 69 column matrix down from 2.5 - 3 minutes to 30-40 seconds.
Thanks for your answer Dennis, this helped me with my work.
pairwise_nxn <- function(mat)
{
cols <- if(is.null(colnames(mat))) paste0("x", seq_len(ncol(mat))) else colnames(mat)
nn <- data.frame(matrix(nrow = length(cols), ncol = length(cols)))
rownames(nn) <- colnames(nn) <- cols
f <- function(x, y)
sum(apply(mat[, c(x, y)], 1, function(z) !any(is.na(z))))
for (i in 1:nrow(nn))
for (j in 1:ncol(nn))
nn[i,j] <- f(rownames(nn)[i], colnames(nn)[j])
nn
}
If your variables are vectors named a and b, would something like sum(is.na(a) | is.na(b)) help you?

Resources