crossproduct matrix unexpectedly full of NAs - r

So I took some information from a CSV, stored it as a matrix, and tried to compute the following operations on the result, but it gave me a 2x2 array of NA. Not seeing the problem here.
data <- read.csv('qog.csv', sep=';')
X <- matrix( log( data$wdi_gnipc ) )
X <- cbind(X, data$ciri_empinx_new)
t(X) %*% X
When I look at X and t(X) they look like how I expect them to, so I am matrix-multiplying a 2xn matrix with an nx2 matrix (n is some large number like 193) and so the matrix product should be well-defined and give a meaningful 2x2 answer.
Any ideas what could be going wrong?
Note: When I try
a <- rbind(c(1,2), c(3,4))
t(a) %*% a
it gives the desired result. Not sure what the important difference is between that and what I'm doing with the data.

Let's make that an answer. For the cross product to be filled with NA, you must have at least one NA per column inside X. You can find the number of NAs per column by running:
colSums(is.na(X))
and by all likelihood you will have that
all(colSums(is.na(X)) > 0)
# [1] TRUE

Related

Non-conformable arguments when using apply

I have a 10000 x 7 data.table dat . I would like to multiply each row of dat by a 7x7 matrix c. I have tried the following
apply(dat, 1, function(x) sqrt(as.matrix(x) %*% as.matrix(c) %*% t(as.matrix(x))))
I get this error
Error in as.matrix(x) %*% as.matrix(c) : non-conformable arguments
This function works when I take a single row from dat (so 1 x 7) but not when I use apply.
How do I make this work using apply?
Thanks!
Additional info - I could achieve what I need another way. I could multiply the entire data.frame by the matrix and take sqrt(diag(x)). However, I need to do this a lot of times, so it would be more efficient to take this row by row and return a single figure.
I think you should use t(as.matrix(x))%*% as.matrix(c) %*% as.matrix(x) in your apply function, since the argument as.matrix(x) is indeed a column-vector (not a row-vector).
res <- apply(dat, 1, function(x) sqrt(t(as.matrix(x))%*% as.matrix(c) %*% as.matrix(x)))
Example
set.seed(1)
dat <- data.frame(matrix(sample(70),ncol = 7))
c <- matrix(sample(49),ncol = 7)
res <- apply(dat, 1, function(x) sqrt(t(as.matrix(x))%*% as.matrix(c) %*% as.matrix(x)))
such that
> res
[1] 1522.7206 1208.6306 1105.7509 1063.4341 1066.3423 1124.8271
[7] 1219.2280 1665.8301 1609.4704 954.3694
Note: c() is a commonly used function in R, using c as variable name is therefore not good practice, I use c_ below instead.
When multiplying matrices the number of columns in the first matrix needs to be the same as the number of rows in the second. In the as.matrix(x) %*% as.matrix(c) part in your code the first matrix has one column and the second has 7 rows. That is why you get the error.
Multiplying the transposed row of dat with c first and then the row fixes this.
apply(dat2, 1, function(x) sqrt(t(as.matrix(x)) %*% as.matrix(c_) %*% (as.matrix(x))))
Or making the function more explicit in regard to the matrix you want to create from the row also works:
apply(dat, 1, function(x) sqrt(matrix(x, 1) %*% c_ %*% t(matrix(x, 1))))
Both solutions produce the same results.

Replace values in a matrix in R subsetting from vectors

I want to replace values in a matrix based on matrix indexes stored in two vectors (one for x, another one for y). I did it some time ago but forgot the syntax for subsetting based on vectors.
Let's say i have this matrix and these 2 arrays:
m <- matrix(0,10,10)
x <- c(1,3,5)
y <- c(2,4,6)
And i need to replace m[1,2], m[3,4], m[5,6] with other value, what would be the syntax in this case? I tried m[x,y] but doesn't work.
Without sparse matrix support:
If we include z <- c(4.5,5.6,6.7) for the values then,
for(i in 1:length(z)) m[x[i],y[i]] <- z[i]
If you want to an apply solution, this is all I could think of,
apply(data.frame(x=x,y=y,z=z),1,function(row) .GlobalEnv$m[row[1],row[2]] <- row[3])
I remembered how it was, to subset a matrix from vectors the syntax is:
m[cbind(x,y)]

Matrix multiplication using variable element producing Error non-conformable arguments

I am a newbie to R, but avid to learn.
I have been trying endlessly to create a matrix with a variable element (in this case [2,2]). The variable element should take number 4 on the first run and 5 on the second (numbers).
This matrix would be multiplied by another matrix (N0) and produce a result matrix (resul).
Up so far, I have only been able to create the initial matrix with the variable element using a for loop, but I am having problems indexing the result matrix. I have tried several versions, but this is the latest. Any suggestions would be greatly appreciated. Thank you.
numbers <- c(4,5,length.out = 2)
A <- matrix(c(1,2,3,NA),nrow=2,ncol=2)
resul <- matrix(nrow=2,ncol=1)
for (i in 1:2) {
A[2,2]<- matrix(numbers[i])
N0 <- matrix(c(1,2),nrow=2,ncol=1)
resul[i,]<- A[i,i]%*%N0
}
Your code has two distinct problems. the first is that A[i,i] is a 1 x 1
matrix, so you're getting an error because your multiplying a 1 x 1 matrix
by a 2 x 1 matrix (N0).
you could either drop the subscript [i,i] and initialize the result to be
a two by two matrix like so:
result <- matrix(nrow=2,ncol=1)
for (i in 1:2){
A[2,2]<- matrix(numbers[i])
# a colunm vector
N0 <- matrix(c(1,2),
nrow=2,
ncol=1)
# note the index is on the column b/c `A%*%N0` is a column matrix
result[,i]<- A%*%N0
}
or you could either drop the the second subscript [i,] and initialize the result to be
a two by two matrix like so:
result <- matrix(nrow=2,ncol=1)
for (i in 1:2){
A[2,2]<- matrix(numbers[i])
# a colunm vector
N0 <- matrix(c(1,2),
nrow=2,
ncol=1)
result[i,]<- A[i,]%*%N0
}
but it's not clear from you post which (if either) answer is the correct one. Indexing is tricky :)

Vectorization of findInterval()

I have following problem with R function findInterval()
Given a vector X and a matrix Y, I want to find in which interval lie elements of X. Intervals are constructed, having breakpoints in Y rows. In other words for X = c(2,3) and Y = matrix(c(3,1,4,2,5,4),2,3), the output would be c(0,2). I wrote following code:
X <- c(2,3)
Y <- matrix(c(3,1,4,2,5,4),2,3)
output <- diag(apply(Y,1,function(z)findInterval(X,z)))
and it works. However, I think, it can be optimised, since the apply function returns 2 x 2 matrix (that's why i had to get diagonal of that). Is there a way to do the same, but using function, which will return a vector, taking as an argument my vector X and matrix Y? I perform this operation on high-demensional vectors, so obtaining unnecessary matrixes size 10000 x 10000 is not a good idea imho. To maximize efficiency, I don't want to use loops.
Thanks in advance for any feedback.
You can do
rowSums(X > Y)
# [1] 0 2

Bootstrapping two datasets in R

I have two dataframes as follows:
seed(1)
X <- data.frame(matrix(rnorm(2000), nrow=10))
where the rows represent the genes and the columns are the genotypes.
For each round of bootstrapping (n=1000), genotypes should be selected at random without replacement from this dataset (X) and form two groups of datasets (X' should have 5 genotypes and Y' should have 5 genotypes). Basically, in the end I will have thousand such datasets X' and Y' which will contain 5 random genotypes each from the full expression dataset.
I tried using replicate and apply but did not work.
B <- 1000
replicate(B, apply(X, 2, sample, replace = FALSE))
I think it might make more sense for you to first select the column numbers, 10 from 200 without replacement (five for each X' and Y'):
colnums_boot <- replicate(1000,sample.int(200,10))
From there, as you evaluate each iteration, i from 1 to 1000, you can grab
Xprime <- X[,colnums_boot[1:5,i]]
Yprime <- X[,colnums_boot[6:10,i]]
This saves you from making a 3-dimensional array (the generalization of matrix in R).
Also, if speed is a concern, I think it would be much faster to leave X as a matrix instead of a data frame. Maybe someone else can comment on that.
EDIT: Here's a way to grab them all up-front (in a pair of three-dimensional arrays):
Z <- as.matrix(X)
Xprimes <- array(,dim=c(10,5,1000))
Xprimes[] <- Z[,colnums_boot[1:5,]]
Yprimes <- array(,dim=c(10,5,1000))
Yprimes[] <- Z[,colnums_boot[6:10,]]

Resources