I'd like to generate a N by n matrix, with colum having different p values:
N=100000
n=100
p= seq(0.005,0.995,.01)
xsim=rbinom(N, size = n, prob = p)
ysim=rbinom(N, size = n, prob = p)
This does not appear to have the correct dimmensions want each row to have a different p. Is there a wa y to do this with apply or otherwise. thanks
sapply(p, rbinom, n = N, size = n)
rbinom doesn't take a vector argument in p.
Related
I want to solve the optimazation problem to search best weights for groups of vectors. Would you like to give some suggestions about how to solve it by R? Thanks very much.
The problem is as follows.
Given there are N groups, we know their similarity matrix among these N groups. The dimension of S is N*N.
In each group, there are K vectors . There are M elements in each vector which value is 0 or 1. .
we can fit an average vector based on these K vectors. For example, average vector
Based on these avearge vectors in each group, we could calculate the correlation among these avearge vectors.
The object is to minimize the differene between correlation matrix C and known similarity matrix S.
Beacuse you didn't provide any data I will generate random and demonstrate way you can approach your problem.
Similarity matrix:
N <- 6
S <- matrix(runif(N^2, -1, 1), ncol = N, nrow = N)
similarity_matrix <- (S + t(S)) / 2
N is number of groups. Each value of similarity matrix is between -1 and 1 and matrix is symmetric (beacuse you want to compare it to covariance matrix these makes sense).
group vectors:
M <- 10
K <- 8
group_vectors <- replicate(N, replicate(K, sample(c(0, 1), M, TRUE)), FALSE)
M is dimension of vector and K is number of binary vectors in each group.
fitness function
fitness <- function(W, group_vectors, similarity_matrix){
W <- as.data.frame(matrix(W, nrow = K, ncol = N))
SS <- cov(
mapply(function(x,y) rowSums(sweep(x, 2, y, "*")), group_vectors, W)
)
sum(abs(SS - similarity_matrix))
}
fitness for given weights calculates described covariance matrix and its distance from similarity_matrix.
differential evolution approach
res <- DEoptim::DEoptim(
fn = fitness,
lower = rep(-1, K*N),
upper = rep(1, K*N),
group_vectors = group_vectors,
similarity_matrix = similarity_matrix,
control = DEoptim::DEoptim.control(VTR = 0, itermax = 1000, trace = 50, NP = 100)
)
W <- matrix(res$optim$bestmem, nrow = K, ncol = N)
genetic algorithm approach
res <- GA::ga(
type = "real-valued",
fitness = function(W, ...) -fitness(W, ...),
lower = rep(-1, K*N),
upper = rep(1, K*N),
group_vectors = group_vectors,
similarity_matrix = similarity_matrix,
maxiter = 10000,
run = 200
)
W <- matrix(res#solution[1,], nrow = K, ncol = N)
I like to draw nrep times from a binomial distribution with theta parameter, to create one k length sequence for each theta, and build them in a matrix dimension nrep x k. How can create a matrix of the results in R?
The code below draws (nrep * k) each from a different theta i.e sequence is not k length from same theta. [My aim is to draw nrep times a binomial probability theta length k.]
### simulate some binary sequence data in matrix ted (1000 x 20)
nrep <- 1000
s <- 7; k <- 20
theta <- rbeta(nrep, shape1=s+1, shape2=k-s+1)
ted <- 0
ted <- matrix(rbinom(k * nrep, 1, theta), ncol = k, nrow = nrep)
hist(ted)
rbinom is vectorized over parameter prob, so you can use rep(theta, k) to achieve
ted <- matrix(rbinom(k * nrep, 1, rep(theta, k)), ncol = k, nrow = nrep)
I have written the following code which should generate an output matrix of 1 row and 10 columns. This did work properly before I corrected my loop by replacing nchrom labels with q, but now generates the following error code:
Error in rmultinom(1, size * q/2, prob = c(num_chrom)) :
no positive probabilities
If anyone can help me identify where the problem in the code is, I would be highly appreciative. My code at the moment is as follows:
randomdiv <- function(nchrom, ndivs, size) {
chrom <- matrix(nrow = 1, ncol = ndivs)
{q <- nchrom
for (i in 1:ndivs)
{
{sz <- matrix(nrow = nchrom, ncol = ndivs)
for (j in 1:nchrom) {
n <- size
for (i in 1:ndivs)
{
old_subs <- rbinom (1, n, 0.5) #roughly halving the number of subunits per chromosome, representing segregation of chromosomes
num_chrom <- rep(1 / q, q) #vector to determine probabilities for multinomial - based on number of chromosomes per cell
new_subs <- rmultinom(1, size * q / 2, prob = c(num_chrom)) #multinomial to generate randomness in number of new subunits translated per cell (based on ideal being half of the total subunit pool)
total_subs <- cbind(old_subs, new_subs) #required step to allow ifelse function to properly work on individual rows
m <- as.matrix(ifelse(total_subs[,1]>0, total_subs[,1] + total_subs[,2], total_subs[,1])) #ifelse function to ensure that if a chromosome reaches 0 subunits, there will be no new subunits added to that chromosom
zeros <- colSums(m==0) #calculates number of zeros in the columns of m and will form a vector (2 values will be shown - only interested in the first for m[1,1])
k <- c(-1, 1)
s <- sample(k, zeros[1], replace = TRUE) #random samples taken from -1 and - number of samples is equal to the number of zeros that have occurred
new_nchrom <- q + sum(s) #Sum of samples determines the number of chromosomes to add or remove from the cell (random element)
chrom[,i] <- new_nchrom #Inserts new number of chromosomes into the matrix for output
q <- new_nchrom
sz[j,i] <- m[1,1] #puts in m matrix as the next column in sz matrix - need to keep a matrix of subunit numbers because the number of subunits reaching 0 determines changes in chromosome number
n <- m
}
}
}
}
}
return (chrom)
}
>randomdiv(10, 10, 3)
I am not able to understand why is this happening. I have a data matrix which is (64x6830). When I do the following
pr.out=prcomp(data,scale=TRUE)
dim(pr.out$rotation)
# [1] 6830 64
I am not able to understand why the rotation matrix is not 6830x6830. When I take a subset of data like this:
data1=data[1:nrow(data),1:10]
pr.data=prcomp(data1,scale=TRUE)
dim(pr.data$rotation)
# [1] 10 10
So for smaller size from the same data is giving correct, but I am clueless why it is giving a different rotation matrix when done on the whole dataset.
The function prcomp is based on the function svd:
svd(x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE)
From edit(stats:::prcomp.default), we see:
s <- svd(x, nu = 0)
This means that the left singular vectors are not computed. Thus, in the case of prcomp, svd only returns "a vector containing the singular values of x, of length min(n, p)" and "a matrix whose columns contain the right singular vectors of x [...]. Dimension c(p, nv)"
If we go back to the call of svd, nv is defined as nv = min(n, p) (minimum between n and p), where n = row(x) and p = ncol(x).
In the case of "data", n = 64 and p = 6830. Then nv = 64 and pr.out$rotation is a 6830x64 (p x nv) matrix
In the case of "data1", n = 10 and p = 10. Then nv = 10 and and pr.out$rotation is a 10x10 (p x nv) matrix
I need to speed up the calculation of the inverse of a WLS covariance matrix in R, where the matrix, wls.cov.matrix, is given by (full example below):
n = 10000
X = matrix(c(rnorm(n,1,2), sample(c(1,-1), n, replace = TRUE), rnorm(n,2,0.5)), nrow = 1000, ncol = 3)
Q = diag(rnorm(n, 1.5, 0.3))
wls.cov.matrix = solve(t(X)%*%diag(1/diag(Q))%*%X)
Is it possible to speed up this calculation?
MORE INFO VERY RELEVANT TO THE FINAL GOAL:
This is still little information, let me explain more my goal and will be clearer if there are ways to speed up my code.
I run 1,0000s of times wls.cov.matrix so I need it to be much faster.
However, each time I run it I use the same X, the only matrix that changes is Q, which is a diagonal matrix.
If X was a square matrix, of same dim as Q, I could just precompute X^-1 and (X^T)^(-1),
X.inv = solve(X)
X.inv.trans = solve(t(X))
and then for each iteration run:
Q.inv = diag(1/diag(Q))
wls.cov.matrix = X.inv%*%Q.inv%*%X.inv.trans
But my X is not square, so is there any other trick?
The main time-consuming part here is t(X)%*%diag(1/diag(Q))%*%X, not the calculation of its inverse.
A nice trick is to calculate it as
crossprod(X / sqrt(diag(Q)));
Confirmation:
all.equal( (t(X) %*% diag(1/diag(Q)) %*% X) , crossprod(X / sqrt(diag(Q))) );
[1] TRUE
To compare the timing run:
Qdiag = diag(Q);
system.time({(t(X) %*% diag(1/Qdiag) %*% X)})
system.time({crossprod(X / sqrt(Qdiag))})
Well, Q is a diagonal matrix, so its inverse is just given by the inverses of the diagonal entries. You can thus do
X = matrix(c(rnorm(n,1,2), sample(c(1,-1), n, replace = TRUE), rnorm(n,2,0.5)), nrow = 1000, ncol = 3)
Qinv = diag(1/rnorm(n, 1.5, 0.3))
wls.cov.matrix = solve(t(X)%*%Qinv%*%X)
And in fact, this speeds things up by about a factor of 20.