Strange behavior in `matrix` in R - r

I'm getting strange behavior where matrix dimensions are not working as expected here is a toy example
n <- 10
delt <- 0.00001
s <- n/delt + 1
print(s)
s = 1000001
x <- matrix(0, nrow = s, ncol = 2)
dim(x)
1000000 2
However, if I type
x <- matrix(0, nrow = 1000001, ncol = 2)
dim(x)
I get what I expect 1000001 2

This is why:
print(s,digits=20L); ## s is slightly under 1000001 after all
## [1] 1000000.9999999998836
as.integer(s); ## truncates to 1000000
## [1] 1000000
The documentation on matrix() doesn't explicitly say it, but the nrow and ncol arguments are internally coerced to integer.
Also see Why are these numbers not equal?.

Related

Efficient way to check a large sparse matrix for non-finite values

I have a large sparse matrix. After populating the matrix with some math, I realized I had some infinite values due to a division by zero error. How can I check this matrix for non-finite values?
Here is a toy matrix.
A <- Matrix(nrow = 150000, ncol = 150000, data = 0, sparse = TRUE)
A[1, 1] = Inf
A[1, 3] = NA
A[2, 1] = -Inf
Trying to find its non-finite values gives me an error:
test <- A[!is.finite(A)]
#Error: cannot allocate vector of size 83.8 Gb
I also tried scanning this matrix row by row but it takes forever.
library(magrittr)
for(i in 1:nrow(A)){
if((
A[i, ] %>% .[!is.finite(.)] %>% length
) > 0) print(i)
}
I then tried running it in parallel, but I think it is overkill. What's more, it still takes a long time.
library(parallel)
library(magrittr)
numCores <- detectCores() - 1
cl <- makeCluster(numCores)
clusterExport(cl, c("A"))
clusterEvalQ(cl, library(magrittr))
out <- A %>% nrow %>% seq %>% parLapply(cl, X = ., function(i) A[i, ] %>% .[!is.finite(.)])
How can I proceed?
If we want to know if a sparse matrix A has any Inf, -Inf, NaN or NA, we can do
any(!is.finite(A#x))
#[1] TRUE
If we also want to know their positions, we can do
subset(summary(A), !is.finite(x))
i j x
1 1 1 Inf
2 2 1 -Inf
3 1 3 NA
Remark:
See R: element-wise matrix division for distinctions between is.infinite, !is.finite, is.na and is.nan.

Saving quantities is a for loop

I am having problems when saving the results in a for loop.
I am computing a variance (this is not relevant I think) and my code is:
library(dirmult)
n <- 50
p <- 20
size <- 5*p
prob_true <- rep(1/p, p)
multinom <- as.matrix(rmultinom(n, size, prob = prob_true))
zeros <- round(0.5*p*n)
a <- c(as.matrix(multinom))
a[sample(1:(p*n), zeros)] <- 0
data_zeros <- matrix(a, p, n)
dirmult <- dirmult(t(data_zeros))
alpha <- dirmult$gamma
sum_alpha <- (1-dirmult$theta)/dirmult$theta
for (j in ncol(data_zeros)){
A <- alpha/sum_alpha
B <- 1 - A
N <- colSums(data_zeros)
C <- 1 + sum_alpha
var_s_dirm <- list()
var_s_dirm[[j]] <- N[j]*A*B*((N[j]+sum_alpha)/C)
}
In particular I can say that alpha is a vector with 20 values, sum_alpha is a scalar data_zeros is my dataset which has 20 rows and 50 columns and N is the sum of each column of the dataset, so it is a vector with 50 values.
It seems very simple to do what I wanted to do:
I want to get a list with 50 vectors where each one differs form the other by the fact that I multiply for a different value of N.
I really hope that somebody can help me finding the error.
The problem is (probably) you are setting constants in each time j is increased, and in each step you clear the list with the line var_s_dirm <- list()...
See if this works for you
library(dirmult)
n <- 50
p <- 20
size <- 5*p
prob_true <- rep(1/p, p)
multinom <- as.matrix(rmultinom(n, size, prob = prob_true))
zeros <- round(0.5*p*n)
a <- c(as.matrix(multinom))
a[sample(1:(p*n), zeros)] <- 0
data_zeros <- matrix(a, p, n)
dirmult <- dirmult(t(data_zeros))
alpha <- dirmult$gamma
sum_alpha <- (1-dirmult$theta)/dirmult$theta
A <- alpha/sum_alpha
B <- 1 - A
N <- colSums(data_zeros)
C <- 1 + sum_alpha
var_s_dirm <- list()
for (j in 1:ncol(data_zeros)){
var_s_dirm[[j]] <- N[j]*A*B*((N[j]+sum_alpha)/C)
}
output
var_s_dirm
[[1]]
[1] 2.614833 2.327105 2.500483 3.047700 2.233528 2.130223 2.700103 2.869699 2.930213 2.575903 2.198459 2.846096
[13] 2.425448 3.517559 3.136266 2.565345 2.578267 2.763113 2.709707 3.420792
[[2]]
[1] 2.568959 2.286279 2.456615 2.994231 2.194343 2.092850 2.652732 2.819353 2.878806 2.530712 2.159889 2.796165
[13] 2.382897 3.455848 3.081244 2.520339 2.533034 2.714637 2.662168 3.360778
[[3]]
[1] 3.211199 2.857849 3.070769 3.742790 2.742930 2.616064 3.315916 3.524193 3.598509 3.163391 2.699862 3.495207
[13] 2.978622 4.319811 3.851556 3.150424 3.166294 3.393297 3.327711 4.200974
....

Optimization of values in a matrix

I typically use Rsolnp for optimization but I am having trouble figuring out how to ask R to find values to fill a matrix (instead of a vector). Is that possible with Rsolnp or any other optimizer?
Here is a simplified example that is not working:
library(Rsolnp)
a<-matrix(rnorm(9), ncol=3)
b<-matrix(rnorm(9), ncol=3)
f1<-function(H) {
return(sum(H*a))
}
f2<-function(H) {
return(sum(H*b))
}
lH<-matrix(rep(0, 9), ncol=3)
uH<-matrix(rep(1, 9), ncol=3)
pars<-uH
target<-1.2
sol <- gosolnp(pars, fixed=NULL, fun=f1, eqfun=f2, eqB=target, LB=lH, UB=uH, distr=uH, n.restarts=10, n.sim=20000, cluster= NULL)
As you can see from the output, Rsolnp seems to be confused by the request:
> sol
$values
[1] 1e+10
$convergence
[1] 0
$pars
[1] NA NA NA NA NA NA NA NA NA
$start.pars
[1] 0.90042133 0.33262541 0.94586530 0.02083822 0.99953060 0.10720068 0.14302770 0.67162637 0.25463806
$rseed
[1] 1487866229
It seems that gosolnp() does not work with matrices. I went through the function in debugging mode and there is a call of solnp() that fails with the message:
Error in pb/cbind(vscale[(neq + 2):(neq + mm + 1)], vscale[(neq + 2):(neq + :
non-conformable arrays
But since a matrix is just a vector with the dimension attribute set, you can always reformulate your problem in terms of vectors. In your case, this is very easy, because you never do something that actually requires a matrix (like, for instance, a matrix product). Just omitting matrix() everywhere works fine.
But I assume that this is just a property of your simplified problem and your actual problem indeed needs to be expressed in terms of matrices. You could get around the problem by converting your vectors into matrices only inside the functions f1() and f2() as follows:
f1 <- function(H) {
return(sum(matrix(H, ncol = 3) * a))
}
f2 <- function(H) {
return(sum(matrix(H, ncol = 3) * b))
}
You can then define a and b as matrices as before, but lH and uH must be vectors:
a <- matrix(rnorm(9), ncol=3)
b <- matrix(rnorm(9), ncol=3)
lH <- rep(0, 9)
uH <- rep(1, 9)
pars <- uH
target <- 1.2
And now you can call gosolnp():
sol <- gosolnp(pars, fixed = NULL, fun = f1, eqfun = f2,
eqB = target, LB = lH, UB = uH, distr = uH,
n.restarts = 10, n.sim = 20000, cluster = NULL)
sol$pars
## [1] 3.917819e-08 9.999997e-01 4.748336e-07 1.000000e+00 5.255060e-09 5.114680e-10
## [7] 4.899963e-01 1.000000e+00 9.260947e-08

R Cran ldei: error in svd(V2, nu = 0, nv = unsolvable) : a dimension is zero

I am quite new to R-Cran. I would like to solve a linear inverse model with constrains. I am using the ldei-function in the limSolve package.
Here are my linear system and constrains:
A x X = C
G x X >=H, with G=I and H=0. (Basically: X>=0)
where:
A in a 2x2 matrix;
X (nrow=2, ncol=n) is the unknown-vector-(matrix);
C (nrow=2, ncol=n) is the constant-known vector-(matrix).
The matrix G is the identity matrix and H is a vector-(matrix) with zero values.
The constrain X>=0 needs to be satisfied as X represents concentrations of nitrogen dioxide and ozone in outdoor air, and they cannot be negative.
Here is my r-code, I assume n=10:
library(limSolve)
A <- matrix(data = NA, nrow=2,ncol=2)
A[1,c(1:2)] <- c(-3.956946e-05,-1.558643e-05)
A[2,c(1:2)] <- c(-8.785099e-05, 1.540414e-04)
ctmp1 <- c(-3.000286e-04,-0.0003545647,-0.0002958569,-0.000356863,-0.0003602479,-0.0004177914,-0.0004280350,-0.0003890670,-0.0004984785,-0.0005695379)
ctmp2 <- c(-6.462205e-05,-0.0007740174,-0.0006427914,-0.001056369,-0.0009569179,-0.0008562010,-0.0005402486,-0.0005043381,-0.0006366220,-0.0009332219)
inC <- rbind(ctmp1,ctmp2)
C <- matrix(data=inC, nrow=2, ncol=10)
G <- matrix(ncol=2,nrow=2,data= c(1,0,0,1))
inH <- rbind(rep(0., length.out=10),rep(0., length.out=10))
H <- matrix(data=inH, nrow=2, ncol=10)
# I am aware I need to use the apply-family instead of do loops-- this is my work in progress--..
for (i in 1:10){
print(i)
E <- A
FF <- c(C[1,i],C[2,i])
GG <- G
HH <- H[,i]
res <-ldei(E,FF,GG,HH)
print(res$X)
}
### Here the output:
[1] 1
[1] 6.326385 3.188475
[1] 2
[1] 8.93305028 0.06985077
[1] 3
[1] 7.44753875 0.07454004
[1] 4
Error in svd(V2, nu = 0, nv = unsolvable) : a dimension is zero
I would be very grateful for any hint to solve the issue:
"Error in svd(V2, nu = 0, nv = unsolvable) : a dimension is zero"
Many thanks for your help!
Firstly, welcome to R!
If you want to see the code for a function you can just type the name of the function without paretheses and R shows the code. So, for svd, we can see that it gives the error you see when there aren't enough dimensions... which means the as.matrix(x) hasn't given an object with dimensions, which means its input x must be not quite as expected.
You can edit these functions too... for debugging purposes. I normally just paste into a text editor, edit and paste back into R.
svd = function (x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE)
{
x <- as.matrix(x)
if (any(!is.finite(x)))
stop("infinite or missing values in 'x'")
dx <- dim(x)
n <- dx[1L]
p <- dx[2L]
if (!n || !p) {
print("x")
print(x)
print("dx")
print(dx)
print("dx1L")
print (dx[1L])
print("dx2L")
print(dx[2L])
stop("a dimension is zero");}
La.res <- La.svd(x, nu, nv)
res <- list(d = La.res$d)
if (nu)
res$u <- La.res$u
if (nv) {
if (is.complex(x))
res$v <- Conj(t(La.res$vt))
else res$v <- t(La.res$vt)
}
res
}
I've made it print out the values of interest...
[1] "x"
[1,]
[2,]
[1] "dx"
[1] 2 0
[1] "dx1L"
[1] 2
[1] "dx2L"
[1] 0
As suspected, the input has no data...
I pulled the same trick in ldei to get the values of E and V2 (which are the x given to svd) I won't put that code here as it's a big much.
The upshot of it is that rnd in
V2 <- V2 %*% rnd
is width 0 because
unsolvable <- Nx - solvable
...
rnd <- matrix(data = runif(Nx * unsolvable), nrow = Nx,
ncol = unsolvable)
and both Nx and solvable are equal to 2!
The reason it even reaches that part of the code is that no all of CC are bigger than -tol... in this case:
[1] "CC"
[,1]
[1,] 9.570040
[2,] -1.399828
[1] "-tol"
[1] -1.490116e-08
Ultimately, the difference in CC is because of the value of F.
So my guess is that the case where FF = c(C[1,4],C[2,4]) produces an error that is not handled by the function.
Sorry I can't do better!
Following the tips Jimi provided, which I am really grateful for, I contacted directly the maintainer. Here below are her suggestions to "skip" the error. They worked for my case.
"The error means that the system is not solvable with ldei, and the function does not catch the error. What you can do is:
for (i in 1:10){
print(i)
E <- A
FF <- c(C[1,i],C[2,i])
GG <- G
HH <- H[,i]
res <-try(ldei(E,FF,GG,HH))
if (! class(res)%in% "try-error")
print(res$X)
}
Therefore, when class(res)%in% "try-error", then it is not solvable (likely at the requested precision)"

R Sum every k columns in matrix

I have a matrix temp1 (dimensions Nx16) (generally, NxM)
I would like to sum every k columns in each row to one value.
Here is what I got to so far:
cbind(rowSums(temp1[,c(1:4)]), rowSums(temp1[,c(5:8)]), rowSums(temp1[,c(9:12)]), rowSums(temp1[,c(13:16)]))
There must be a more elegant (and generalized) method to do it.
I have noticed similar question here:
sum specific columns among rows
couldn't make it work with Ananda's solution;
Got following error:
sapply(split.default(temp1, 0:(length(temp1)-1) %/% 4), rowSums)
Error in FUN(X[[1L]], ...) :
'x' must be an array of at least two dimensions
Please advise.
You can use by:
do.call(cbind, by(t(temp1), (seq(ncol(temp1)) - 1) %/% 4, FUN = colSums))
If the dimensions are equal for the sub matrices, you could change the dimensions to an array and then do the rowSums
m1 <- as.matrix(temp1)
n <- 4
dim(m1) <- c(nrow(m1), ncol(m1)/n, n)
res <- matrix(rowSums(apply(m1, 2, I)), ncol=n)
identical(res[,1],rowSums(temp1[,1:4]))
#[1] TRUE
Or if the dimensions are unequal
t(sapply(seq(1,ncol(temp2), by=4), function(i) {
indx <- i:(i+3)
rowSums(temp2[indx[indx <= ncol(temp2)]])}))
data
set.seed(24)
temp1 <- as.data.frame(matrix(sample(1:20, 16*4, replace=TRUE), ncol=16))
set.seed(35)
temp2 <- as.data.frame(matrix(sample(1:20, 17*4, replace=TRUE), ncol=17))
Another possibility:
x1<-sapply(1:(ncol(temp1)/4),function(x){rowSums(temp1[,1:4+(x-1)*4])})
## check
x0<-cbind(rowSums(temp1[,c(1:4)]), rowSums(temp1[,c(5:8)]), rowSums(temp1[,c(9:12)]), rowSums(temp1[,c(13:16)]))
identical(x1,x0)
# TRUE
Here's another approach. Convert the matrix to an array and then use apply with sum.
n <- 4
apply(array(temp1, dim=c(dim(temp1)/c(1,n), n)), MARGIN=c(1,3), FUN=sum)
Using #akrun's data
set.seed(24)
temp1 <- matrix(sample(1:20, 16*4, replace=TRUE), ncol=16)
a function which sums matrix columns with each group of size n columns
set.seed(1618)
mat <- matrix(rnorm(24 * 16), 24, 16)
f <- function(mat, n = 4) {
if (ncol(mat) %% n != 0)
stop()
cols <- split(colSums(mat), rep(1:(ncol(mat) / n), each = n))
## or use this to have n mean the number of groups you want
# cols <- split(colSums(mat), rep(1:n, each = ncol(mat) / n))
sapply(cols, sum)
}
f(mat, 4)
# 1 2 3 4
# -17.287137 -1.732936 -5.762159 -4.371258
c(sum(mat[,1:4]), sum(mat[,5:8]), sum(mat[,9:12]), sum(mat[,13:16]))
# [1] -17.287137 -1.732936 -5.762159 -4.371258
More examples:
## first 8 and last 8 cols
f(mat, 8)
# 1 2
# -19.02007 -10.13342
## each group is 16 cols, ie, the entire matrix
f(mat, 16)
# 1
# -29.15349
sum(mat)
# [1] -29.15349

Resources