I need to write the following in an R program:
X[,1] = (C[1,1]*Y[,1])+(C[1,2]*Y[,2])+(C[1,3]*Y[,3]) + mu[1]
X[,2] = (C[2,1]*Y[,1])+(C[2,2]*Y[,2])+(C[2,3]*Y[,3]) + mu[2]
X[,3] = (C[3,1]*Y[,1])+(C[3,2]*Y[,2])+(C[3,3]*Y[,3]) + mu[3]
I'm writing the following:
for (i in 1:3){
for (j in 1:3) {
X[,i] = sum((C[i,j]*Y[,j]))+ mu[i]
}
}
but the answer is not the same as writing all the above. Can anyone help?
There is no need for for-loop in your calculation.
Let's say that your data are something like this:
set.seed(1)
C = matrix(1:9, nrow = 3, ncol = 3)
C
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Y = 1:3
mu = rnorm(3)
mu
[1] -0.6264538 0.1836433 -0.8356286
Then simply you can write:
X = C%*%Y + mu
X
[,1]
[1,] 29.37355
[2,] 36.18364
[3,] 41.16437
With your for-loop you actually calculate X[i] as C[i,3]*Y[,3]+ mu[i] because you rewrite the value of X[i] in each j-loop.
If you insist on using for-loop solution, use only one for loop:
for (i in 1:3){
X[i] <- sum(C[i, 1:3]*Y) + mu[i]
}
X
[,1]
[1,] 29.37355
[2,] 36.18364
[3,] 41.16437
Moreover, if the Y is also the matrix, you can try this:
set.seed(1)
C = matrix(1:9, nrow = 3, ncol = 3)
Y = matrix(1:9, nrow = 3, ncol = 3)
mu = rnorm(3)
X = C%*%Y + mu
X <- matrix(0, nrow = 3, ncol = 3)
for (i in 1:3){
for (j in 1:3){
X[i, j] <- sum(C[i, ]*Y[, j]) + mu[i]
}
}
Related
I'm trying to simulate data utulizing a for loop and storing it in some matrix with the following code:
m <- matrix(nrow = 500 , ncol = 7)
for(i in seq(from = 1, to = 4, by = 0.5)){
a <- 1 * i + rnorm(n = 500, mean = 0, sd = 1)
m[, i] <- a
}
But instead of giving me 7 columns with means of roughly 1, 1.5, 2, 2.5, 3, 3.5 and 4. matrix m contains 4 columns with means of roughly 1.5, 2.5, 3.5 and 4 and 3 columns of NA values.
If i change the increments to 1 and run the below code, everything behaves as expected so the issue seems to be with the increments, but i cant figure out what i should do differently, help would be most appreciated.
m <- matrix(nrow = 500 , ncol = 7)
for(i in seq(from = 1, to = 7, by = 1)){
a <- 1 * i + rnorm(n = 500, mean = 0, sd = 1)
m[, i] <- a
}
Column indices must be integers. In your case, you try to select column 1.5 which is not possible. You can fix this by some simple calculations ((i * 2) - 1)
# reduce number of rows for showcase
n <- 100
m <- matrix(nrow = n , ncol = 7)
for(i in seq(from = 1, to = 4, by = 0.5)){
# NOTE: 1*i does not change anything
a <- 1*i + rnorm(n = n, mean = 0, sd = 1)
# make column index integerish
m[, (i * 2) - 1] <- a
}
m[1:5, ]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1.15699467 0.8917952 1.999899 2.330557 4.502607 4.469957 5.687460
#> [2,] -1.13634309 1.5394771 1.700148 1.669329 2.124019 3.472836 3.513351
#> [3,] 2.08584731 1.0591743 2.866186 3.192953 3.984286 3.593902 3.983265
#> [4,] 0.02211767 2.2222376 2.055832 2.927851 2.846376 3.411725 3.742966
#> [5,] 0.49167319 2.2244472 2.190050 3.525931 2.841522 5.722172 4.797856
colMeans(m)
#> [1] 0.8537568 1.6805235 1.9907633 2.6434843 2.8651140 3.5499583 3.9757984
When you use rnorm, it actually allows vectorzied input for the mean value, so you can try the code below (but you should use matrix to fit the obtained output into the desired dimensions of your output matrix)
nr <- 500
nc <- 7
m <- t(matrix(rnorm(nr * nc, seq(1, 4, 0.5), 1), nc, nr))
where you can see, for example
> m[1:5, ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 3.2157776 0.3805689 0.7550255 2.508356 3.567479 2.597378 4.122201
[2,] 0.8634009 0.4887092 2.5655513 1.710756 2.377790 3.733045 4.199812
[3,] -0.1786419 2.4471083 1.2138140 3.090687 2.763694 3.471715 4.676037
[4,] 1.2492511 2.3480447 2.2180039 1.965656 1.505342 3.832380 4.086075
[5,] -0.1301543 1.7463687 1.2467769 2.649525 4.795677 2.606623 4.318468
> colMeans(m)
[1] 0.901146 1.476423 1.900147 2.567463 2.996918 3.468140 4.025929
You're using i as a row index, but i has non-integer values. Only integers can be used for indexing a matrix/df. When i is, say, 1.5 but you try to use it in the m[,i] expression, it gets forced into an integer and rounded down to 1, so the first 2 runs of your loop overwrite each other (and the 3rd and 4th, etc.).
You could simply use your second code and replace 1*i with 0.5 + 0.5*i:
m <- matrix(nrow = 5000 , ncol = 7)
for(i in seq(from = 1, to = 7, by = 1)){
a <- 0.5 + 0.5*i + rnorm(n = 5000, mean = 0, sd = 1)
m[,i] <- a
}
However, it may be better to use the params of the rnorm function to generate values with a specified mean/sd: currently, you are drawing from a normal distribution centered around 0 then shifting it sideways; you could simply tell it to use the mean you actually want.
m <- matrix(nrow = 5000 , ncol = 7)
for(i in seq(from = 1, to = 7, by = 1)){
m[,i] <- rnorm(n = 5000, mean = 0.5 + 0.5*i, sd = 1)
}
I have two vectors x and y of coordinates and a 3D array A in R. I want to produce a matrix, where the i'th row is A[x[i], , y[i]].
If A was 2D, I believe I could use A[cbind(x,y)]. For the 3D array, I think the following works, but it's kind of slow:
sapply(1:length(x), function(i) A[x[i],,y[i]]).
Is there a faster way to do this, e.g. by somehow using cbind?
Edit:
For instance, consider the following
A = array(1:12, c(2,2,3))
x = c(1,2,1)
y = c(1,2,3)
> A
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
, , 3
[,1] [,2]
[1,] 9 11
[2,] 10 12
I would like to get the following output, but with a faster code:
> t(sapply(1:length(x), function(i) A[x[i],,y[i]]))
[,1] [,2]
[1,] 1 3
[2,] 6 8
[3,] 9 11
indices <- cbind(
rep(x, each = dim(A)[2]),
rep(seq_len(dim(A)[2]), times = length(x)),
rep(y, each = dim(A)[2])
)
identical(
array(A[indices], dim = c(dim(A)[2], length(x))),
sapply(1:length(x), function(i) A[x[i],,y[i]])
)
#> [1] TRUE
Data:
A <- array(1:30, dim = c(2, 3, 5))
n <- 7
set.seed(123)
x <- sample(dim(A)[1], n, replace = TRUE)
y <- sample(dim(A)[3], n, replace = TRUE)
Performance: not necessarily better, depends on use case, see plot for insights
set.seed(42)
create_data <- function(array_size, coordinates_size) {
list(
A = array(1:array_size^3, dim = rep(array_size, 3)),
x = sample(array_size, coordinates_size, replace = TRUE),
y = sample(array_size, coordinates_size, replace = TRUE)
)
}
results <- bench::press(
array_size = c(10, 100, 1e3),
coordinates_size = c(100, 1e3, 10e3),
{
dat <- create_data(array_size, coordinates_size)
A <- dat[["A"]] ; x <- dat[["x"]] ; y <- dat[["y"]]
bench::mark(
sapply = {
sapply(1:length(x), function(i) A[x[i],,y[i]])
},
cbind = {
indices <- cbind(
rep(x, each = dim(A)[2]),
rep(seq_len(dim(A)[2]), times = length(x)),
rep(y, each = dim(A)[2])
)
array(A[indices], dim = c(dim(A)[2], length(x)))
}
)
}
)
ggplot2::autoplot(results)
I have made a function so that it works when its arguments each have a length >= 2.
But I'm wondering why the function only works when its argument have each have a length of >= 3!
Am I missing something? (Any fix so the function works when length of its args are each of 2 as well?)
[Note: I always expect the output of function (i.e., CI) to be a matrix with 2 columns, length(n) rows, except when length(n) == 2. When length(n) == 2 I expect the output to have 1 row, and 2 columns.]
abc <- function(n, yes, a, b = a){
p <- list()
for(i in 1:length(n)){
p[[i]] <- rbeta(1e3, a[i] + yes[i], b[i] + (n[i] - yes[i]))
}
ps <- combn(p, 2, FUN = function(x) x[[1]]- x[[2]])
CI <- matrix(NA, length(n), 2)
for(i in 1:length(n)){
CI[i, ] <- quantile(ps[, i], c(.025, .975))
}
CI
}
For example:
abc(n = c(10, 20, 30), yes = rep(5, 3), a = rep(1, 3)) # Works well :-)
abc(n = c(10, 20), yes = rep(5, 2), a = rep(1, 2)) # Doesn't work! :-(
# Error in ps[, i] : subscript out of bounds
There is easy fix to problem. Replace length(n) with ncol(ps) while creating result matrix and running for loop to copy values to CI. It makes more sense as number of combinations generate by 'combnwill more than actual length ofn`.
abc <- function(n, yes, a, b = a){
p <- list()
for(i in 1:length(n)){
p[[i]] <- rbeta(1e3, a[i] + yes[i], b[i] + (n[i] - yes[i]))
}
str(p)
ps <- combn(p, 2, FUN = function(x) x[[1]]- x[[2]])
CI <- matrix(NA, ncol(ps), 2)
for(i in 1:ncol(ps)){
CI[i, ] <- quantile(ps[, i], c(.025, .975), na.rm = TRUE)
}
CI
}
#Results
#> abc(n = c(10, 20, 30), yes = rep(5, 3), a = rep(1, 3))
# [,1] [,2]
#[1,] -0.10141014 0.5774627
#[2,] 0.02638096 0.6159326
#[3,] -0.12473451 0.3069135
#> abc(n = c(10, 20), yes = rep(5, 2), a = rep(1, 2))
# [,1] [,2]
#[1,] -0.1228497 0.5304606
I use the following code to generate a matrix
randomdiv <-
function(nchrom, ndivs, size) {
sz <- matrix(nrow = nchrom, ncol = ndivs)
for (j in 1:nchrom) {
n <- size
for (i in 1:ndivs)
{
old_subs <- rbinom (1, n, 0.5)
num_chrom <- rep(1 / nchrom, nchrom)
new_subs <- rmultinom(1, size * nchrom / 2, prob = c(num_chrom))
m <- old_subs + new_subs
sz[j,i] <- m[1,1]
n <- m
}
}
return (sz)
}
>randomdiv(3, 3, 10)
[,1] [,2] [,3]
[1,] 11 13 12
[2,] 6 8 5
[3,] 12 11 9
The only adjustment I need to make is that when a 0 is generated in the column by the rbinom function, I need that occurence to stay as a 0 for the remainder of the matrix, but anything >0 needs to go through the rest of the loop and have new_subs added to it.
I have tried;
randomdiv <- function(nchrom, ndivs, size) {sz <- matrix(nrow = nchrom, ncol = ndivs)
for (j in 1:nchrom) {
n <- size
for (i in 1:ndivs)
{
old_subs <- rbinom (1, n, 0.5)
num_chrom <- rep(1/nchrom, nchrom)
new_subs <- rmultinom(1, size*nchrom/2, prob = c(num_chrom))
m <- ifelse(old_subs>0, old_subs + new_subs, old_subs+0)
sz[j,i] <- m[1,1]
n <- m
}
}
return (replicate(ncell, sz, simplify = FALSE))
}
> randomdiv(3, 3, 10)
#Error in m[1, 1] : incorrect number of dimensions
I've tried a few different tactics with the ifelse function, but I think it only treats the columns as a whole, so if there is a 0 at all, nothing happens for the whole column, whereas I need each value in the columns to be treated individually.
You just need to use if() with an else and skip several lines of code if there's a 0:
randomdiv <-
function(nchrom, ndivs, size) {
sz <- matrix(nrow = nchrom, ncol = ndivs)
for (j in 1:nchrom) {
n <- size
for (i in 1:ndivs)
{
old_subs <- rbinom (1, n, 0.5)
if(old_subs>0){
num_chrom <- rep(1 / nchrom, nchrom)
new_subs <- rmultinom(1, size * nchrom / 2, prob = c(num_chrom))
m <- old_subs + new_subs
sz[j,i] <- m[1,1]
} else sz[j,i] <- old_subs
n <- m
}
}
return (sz)
}
randomdiv(3, 3, 2)
# [,1] [,2] [,3]
# [1,] 2 2 0
# [2,] 1 2 4
# [3,] 1 1 0
I have 2 given matrices
a1 <- matrix(c(0.4092951, 0.1611806, 0.4283178, 0.001206529), nrow =
1)
a2 <- matrix(c(0.394223557, 0.140443266, 0.463980790, 0.001352387),
nrow = 1)
I have an initial matrix
b <- matrix(c(0.4095868, 0.1612955, 0.4286231, 0.0004946572,
0, 0.2732351, 0.7260891, 0.0006757670,
0, 0, 0.9909494, 0.0090505527,
0, 0, 0, 1), nrow = 4, byrow = T)
I need to update 'b' such that
a1 %*% b = a2
The above is an optimization problem where the
objective function is to minimize
(a1 %*% b - a2)
which would drive the value of the sum(absolute value(a1 %*% b - a2)) to zero, subject to the constraints:
Lower triangle(b) = 0 ;
RowSum(b) = 1
## creating a data vector with a1 and a2
data = c(as.numeric(a1), as.numeric(a2))
## objective function
min_obj <- function(p){
## Creating a matrix to recreate 'b'
p1 <- matrix(rep(0, 16), nrow = 4)
k = 1
for(i in 1:nrow(p1)){
for (j in 1:ncol(p1)){
if(j >= i){
p1[i,j] <- p[k]
k = k+1
}
}
}
actual <- matrix(data[1:(length(data)/2)], nrow = 1)
pred <- matrix(data[(length(data)/ 2 + 1):length(data)], nrow = 1)
s <- (actual %*% p1) - pred
sum(abs(s))
}
## Initializing the initial values for b taking only non-zero values
init <- b[b>0]
opt <- optim(init, min_obj, control = list(trace = T), method =
"L-BFGS-B", lower = rep(0, length(init)), upper = rep(1,
length(init)))
transformed_b <- matrix(rep(0, 16), nrow = 4)
k = 1
for(i in 1:nrow(transformed_b)){
for (j in 1:ncol(transformed_b)){
if(j >= i){
transformed_b[i,j] <- opt$par[k]
k = k+1
}
}
}
transformed_b
The issue with transformed_b is that rowSum of the matrix is not 1. Any help is highly appreciated.
"optim" is the right choice. Since the row sums have to be 1, there are only 6 parameters, not 10 as in your attempt. The diagonal is uniquely determined by the values strictly above the diagonal.
a1 <- matrix(c(0.4092951, 0.1611806, 0.4283178, 0.001206529), nrow =
1)
a2 <- matrix(c(0.394223557, 0.140443266, 0.463980790, 0.001352387),
nrow = 1)
b <- matrix(c(0.4095868, 0.1612955, 0.4286231, 0.0004946572,
0, 0.2732351, 0.7260891, 0.0006757670,
0, 0, 0.9909494, 0.0090505527,
0, 0, 0, 1), nrow = 4, byrow = T)
#======================================================================
# Build an upper triangular matrix with rowsums 1:
B <- function(x)
{
X <- matrix(c(0,x[1:3],0,0,x[4:5],0,0,0,x[6],rep(0,4)),4,4,byrow=TRUE)
diag(X) <- 1-rowSums(X)
return(X)
}
#----------------------------------------------------------------------
# The function we want to minimize:
f <- function(x)
{
return (sum((a1%*%B(x) - a2)^2))
}
#----------------------------------------------------------------------
#Optimization:
opt <- optim( par = c(b[1,2:4],b[2,3:4],b[3,4]),
fn = f,
lower = rep(0,6),
method = "L-BFGS-B" )
optB <- B(opt$par)
Result:
> optB
[,1] [,2] [,3] [,4]
[1,] 0.9631998 0.03680017 0.0000000 0.0000000000
[2,] 0.0000000 0.77820700 0.2217930 0.0000000000
[3,] 0.0000000 0.00000000 0.9998392 0.0001608464
[4,] 0.0000000 0.00000000 0.0000000 1.0000000000
> a1 %*% optB - a2
[,1] [,2] [,3] [,4]
[1,] 9.411998e-06 5.07363e-05 1.684534e-05 -7.696464e-05
> rowSums(optB)
[1] 1 1 1 1
I chose the sum of squares instead of sum of absolute values, since it is differentiable. This makes it easier for "optim" to find the minimum, I guess.