I would like to create two randomly generated matrices which sum of the row's elements of the first matrix is equal with the sum of column's elements of the second matrix in R.
I know how to create randomly generated matrix, for exapmle:
> A=matrix(rnorm(n=9, 0, 1), nrow=3, ncol=3)
> B=matrix(rnorm(n=9, 1, 5), nrow=3, ncol=3)
but how can I impose this restriction when I am generating these two matrices?
Would something like this work for you:
size <- 10
matrix_1 <- matrix(nrow = size, ncol = size)
matrix_2 <- matrix(nrow = size, ncol = size)
for (i in seq_len(size)){
vector <- rnorm(n=size, 1, 5)
matrix_1[i,] <- vector
matrix_2[,i] <- sample(vector, size)
}
The only problem is that the second matrix is not fully random, it is a random sample of the first matrix.
Related
I wish to apply a custom function to each element of a matrix whilst also using elements of a different matrix as inputs to the function.
Specifically, my function generates random samples from a von Mises distribution (circular normal distribution), calling the Rfast package's rvonmises function.
I have one matrix (radians) which records the angle I wish to use for the central tendency of the random generation (similar to the mean), and another matrix (kappa) which records the concentration parameter of the von Mises I wish to use (similar to standard deviation).
I wish to use (for example) element [1, 1] of the radians matrix together with element [1, 1] of the kappa matrix in a call to the von Mises random generator. So, my call for one element would be:
rvonmises(n = 1, m = radians[1, 1], k = kappa[1, 1])
But of course I want this applied across all elements of the matrices. (The rvonmises function doesn't accept multiple m or k values, so for example I couldn't use rvonmises(4, m = c(1, 2, 3, 4), k = c(1, 1.2, 1.4, 1.6)).)
To summarise: I am basically after a more principled (and faster!) way of doing this:
for(i in 1:nrow(radians)){
for(j in 1:ncol(radians)){
result[i, j] <- Rfast::rvonmises(1, radians[i, j], kappa[i, j])
}
}
What I have tried
Based on this post, I have tried to use mapply:
library(Rfast)
set.seed(42)
# random radians to use as input
radians <- matrix(data = runif(12, 0, 2 * pi),
ncol = 4)
# random concentration parameters of the von Mises distribution
kappa <- matrix(data = rgamma(12, 70, 30),
ncol = 4)
# function to generate random von mises sample with angle x and
# concentration parameter k
my_function <- function(m, k){
Rfast::rvonmises(1, m, k)
}
# my attempt
out <- matrix(mapply(my_function, m = as.data.frame(radians), k = kappa),
ncol = 4, byrow = TRUE)
However, I don't think this is working. For example, if I test it by the following (where the central tendency in test_radians increases steadily and I use large values for kappa which leads to precise estimates):
test_radians <- matrix(data = seq(from = 1, to = 2 * pi, length.out = 12),
ncol = 4)
test_kappa <- matrix(data = rep(20, times = 12),
ncol = 4)
test <- matrix(mapply(my_function, m = as.data.frame(test_radians),
k = test_kappa),
ncol = 4, byrow = TRUE)
test[1, 1] should be smaller (on average), and test[3, 4] should be largest. (I know due to random variability this won't always be the case, but I've tried it with many replications.)
So, the mapping and matching between matrices isn't working as I had anticipated.
Any guidance welcomed.
You cannot compute the mean of circular observations by simply calling "mean". This is wrong. The correct way is to compute the mean of the cosinus and sinus of the angles and then use the arc tangent. See pcakcges for directional or circular data for this.
Secondly, you gave us an idea, to return a matrix of von Mises generated data. But, since brms does this job for you, at the moment I would go there.
I am given a matrix M. I now need to determine a matrix of the same dimension, which is defined by
N_{i,j} = M_{A(i,j),B(i,j)}
for two matrices A and B of the same dimension, which define indices.
As an example,
set.seed(1)
M <- matrix(LETTERS[1:(4*6)], ncol=6)
A <- matrix(sample(c(1:4), 4*6, replace=TRUE), ncol=6)
B <- matrix(sample(c(1:6), 4*6, replace=TRUE), ncol=6)
How do I now quickly determine N?
Try this:
replace(M, TRUE, M[cbind(c(A), c(B))])
or
array(M[cbind(c(A), c(B))], dim(M))
I have dataframe with 6 columns of numerical data. I want to run a t-test on pairs of columns in the form; the first + second column, third+fourth column, fifth + sixth column, until I have 3 t-test p-values which I can store in a new dataframe.
I know I can get the columns with select() and I can extract the p-values with t.test(x,y)$p.values. But I can't figure out a good way of automating this in a for loop.
I thought of doing something like this:
p-values <- for (i in select(dataframe,-1) t.test(i,?)$p.value
As a start but not sure where to go from here.
A quick answer will be :
df <- matrix(rnorm(18), nrow = 3, ncol = 6)
p <- ncol(df)
indices <- cbind(seq(1, p, 2), seq(2, p + 1, 2))
for(i in 1:(p/2)){
print(t.test(df[,indices[i,]])$p.value)
}
It works only if you have a number pair of columns.
Creating an index of columns as RĂ©mi Coulaud did is a good approach. Here is a variation that creates the same index and returns the paired p-values directly:
df <- as.data.frame(matrix(rnorm(36), nrow = 6, ncol = 6))
indices <- matrix(seq_len(ncol(df)), nrow = 2)
p.values <- apply(indices, 2, function(idx) t.test(df[idx])$p.value)
I wish to represent p values and distances as lower triangular and upper triangular entries in a single matrix. While I managed to create a UT or LT matrix for both, I have ben unable to merge them into a single data frame in R.
dist[(upper.tri(dist,diag=FALSE))]=0 #upper tri of distances
pval[(lower.tri(pval,diag=FALSE))]=0 #lower tri of p-values
I tried the following line but does not work
dist[(upper.tri(dist,diag=FALSE))]=pval[(lower.tri(pval,diag=FALSE))]
Any possible way of doing this?
I'm sure this could be done more elegantly, but I think this does what you want:
a <- matrix(0, nrow = 10, ncol = 10)
b <- matrix(1, nrow = 10, ncol = 10)
a[upper.tri(a)]
b[lower.tri(b)]
new <- matrix(NA, nrow = 10, ncol = 10)
new[upper.tri(new)] <- a[upper.tri(a)]
new[lower.tri(new)] <- b[lower.tri(b)]
new
Since you did not supply a reproducible example, I can't be sure, but basically I just take the upper and lower of matrices (one of 0s and the other of 1s) and combine them in new. As proof of concept, new has 0s above the diagonal, 1s below, and NAs on the diagonal itself. Hopefully this gives you some insight into your issue.
Though, this question already answered I would like to add the following code for future use for anybody.
First, create two matrices of 10 by 10, with 1s and 2s only. Then using the package Matrix get only the lower and upper triangular matrices. Since there are no overlaps, we can simply use addition to combine the two matrices. Then convert the "dgeMatrix" first into a matrix and then to a data frame.
a <- matrix(1,10,10)
b <- matrix(2,10,10)
library(Matrix)
a <- tril(a, -1) # strict lower triangular matrix (omit diagonals)
b <- triu(b, 1) # strict upper triangular matrix
c <- a + b
c <- as.data.frame(as.matrix(c))
I'm trying to do a leave-one-out cross-validation on a relatively small dataset (n = 22, p = 17) on a linear regression made from the LARS algorithm. Essentially I need to create n matrices of standardized data (each column consists of entries centered by the mean and standardized by the SD of the column).
I've never used lists before, but would be open to making lists as long as columns of the different matrices can be manipulated/standardized.
Here's what I tried in R:
for (i in 1:n)
{
x.standardized.i <- matrix(data = NA, nrow = (n-1), ncol = p) #creates n matrices, all n-1 x p
for (j in 1:p)
{
x.standardized.i[,j] <- ((x[-i,j]-mean(x[-i,j]))/sd(x[-i,j])) #and standardizes the p variables with the ith row missing in each n matrix (i increments from 1 to n)
}
}
I'm not sure if I can share the data, since it's related to grades from a class, but when I run the code it goes through the loop and stops by assigning a standardized matrix with the last row missing as x.standardized.i.
You can do this quite simply with sapply and scale:
# Create dummy data
m <- matrix(runif(200), ncol=10)
# Leave each row out in turn, and scale each column
A <- sapply(seq_len(nrow(m)), function(i) scale(m[-i, ]), simplify='array')
By default, scale centres each column on its mean, and divides by its sd.
For the example above, you'll end up with an array with 19 rows, 10 columns and 20 slices.
To access particular slices (i.e. cross-validation training folds), you can subset like this:
A[,, 1] # all rows, all cols, first slice
A[,, 10] # all rows, all cols, tenth slice
To confirm that columns are centred on their mean and standardised by one sd:
apply(A, c(2, 3), mean)
apply(A, c(2, 3), sd)