c<- c(1.88, 2.33, -2.4, -0.6)
dim(c)<-c(2,2)
I have a data set, 9X12 matrix.
The data set is standardized to be normal, so I can compare each element.
For better comparing, I want to change each value to p-value.
How can I make it? (Please use above matrix.)
Please let me know.
Don't use c for a variable name (you know better):
A <- c(1.88, 2.33, -2.4, -0.6)
dim(A) <- c(2,2)
You are looking for pnorm:
pnorm(A)
# [,1] [,2]
# [1,] 0.9699460 0.008197536
# [2,] 0.9900969 0.274253118
Related
I have a set of columns with numerical values that describe a given object (row) in some 5 dimensional space. I want to compute the distance for each object from a fixed object at various times. I can group_by time and perform the desired computation. The issue is that I'm not sure how to do the computation. I want to use the Euclidean distance (squared) to measure the distance between these objects in 5 dimensional space. So clearly at each time, the reference object should be 0 distance from itself.
The metric should look like distance from object x to object Reference is
(x1 - Reference1)^2 + (x2 - Reference2)^2 + ....
I'm VERY new to working in R (and programming in general), so I was hoping this exercise would help me learn; I apologize if my question is not appropriate, I'm new.
My data looks like
Distances from rows to other rows can be done in base R with this:
mtx <- structure(c(2.8, 6.4, 1.7, 3.2, 24.2, 25.5, 5.4, 16.2, 15.6, 25.1, 8.6, 15.4, 0.7, 0.8, 0.1, 0.5, 0.1, 0.4, 0.04, 0.2), .Dim = 4:5)
outer(seq_len(nrow(mtx)), seq_len(nrow(mtx)),
function(a, b) rowSums((mtx[a,] - mtx[b,])^2))
# [,1] [,2] [,3] [,4]
# [1,] 0.0000 105.0000 404.0136 64.2500
# [2,] 105.0000 0.0000 698.9696 190.9500
# [3,] 404.0136 698.9696 0.0000 165.3156
# [4,] 64.2500 190.9500 165.3156 0.0000
Granted, you only need to calculate (less-than) half of that matrix, since the diagonal is always zero and the upper/lower triangles of it are mirrors, but this gives you what you need. For instance, the distances from the third row to all other rows are in the third row (and third column).
If all you need is one row compared to all others, then
rowSums((mtx[rep(3,nrow(mtx)),] - mtx)^2)
# [1] 404.0136 698.9696 0.0000 165.3156
The mtx[rep(3,nrow(mtx)),] creates a same-size matrix so that subtraction works seamlessly.
I want to create a matrix in R with element [-1,0,1] with probability [1/6, 2/3, 1/6] respectively. The probability may change during runtime. for static probability I have got the output but the problem is dynamic change in the probability.
for example, If I create a matrix for the above probability with [sqrt(3),0,-sqrt(3)], the required output is.
Note: The Probability should not be static as mentioned. It may vary during runtime.
Kindly help to solve this.
Supposing you want a 2x3 matrix:
matrix(sample(c(-1,0,1), size=6, replace=TRUE, prob=c(1/6,2/3,1/6)), nrow=2)
So you sample from the values you want, with probabilities defined in prob. This is just a vector, but you can make it into a matrix of the desired shape using matrix afterwards. Replace the probabilities by a variable instead of values to not make it static.
If the numbers should be distributed according to a certain scheme rather than randomly drawn according to a probability, replicate the vector elements and shuffle them:
matrix(sample(rep(c(-1,0,1), times=c(1,4,1))), nrow=2)
You can try this to generate a mxn matrix:
sample.dynamic.matrix <- function(pop.symbols, probs, m, n) {
samples <- sample(pop.symbols, m*n, prob = probs, replace=TRUE)
return(matrix(samples, nrow=m))
}
set.seed(123)
sample.dynamic.matrix(-1:1, c(1/6,2/3,1/6), 2, 3)
# [,1] [,2] [,3]
#[1,] 0 0 -1
#[2,] 1 -1 0
I'm given three consequent monthly returns: 0.02, -0.03, 0.04. And I'm asked to plot the growth of an investment. Therefore I need to transform these into some kind of actual values. I think I can do it with for the following way:
test <- c(0.02, -0.03, 0.04)
z <- c(1)
for (i in 1:length(test)) {
z[i+1] <- z[i] + z[i] * test[i]
}
z
But it's a bad practice to use for, I assume, and, on the other side, I bet it's also possible to achieve the same result with the apply family, isn't it?
Thus, I'd be grateful for any advice, thanks :).
You can try:
z <- z*c(1, cumprod(1+test))
Example:
test <- c(0.02, -0.03, 0.04)
z <- 1
z <- z*c(1, cumprod(1+test))
The result of the multiplication is:
> z*c(1, cumprod(1+test))
[1] 1.000000 1.020000 0.989400 1.028976
How can I create a matrix of pseudo-random values that is guaranteed to be non-singular? I tried the code below, but it failed. I suppose I could just loop until I got one by chance but I would prefer a more elegant "R-like" solution if anyone has an idea.
library(matrixcalc)
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
is.singular.matrix(exampledf) #this may or may not return false
using a while loop:
exampledf<-NULL
library(matrixcalc)
while(is.singular.matrix(exampledf)!=TRUE){
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
}
I suppose one method that guarantees (not is fairly likely, but actually guarantees) that the matrix is non-singular, is to start from a known non-singular matrix and apply the basic linear operations used for example in Gaussian Elimination: 1. add / subtract a multiple of one row from another row or 2. multiply row by a constant.
Depending on how "random" and how dense you want your matrix to be you can start from the identity matrix and multiply all elements with a random constant. Afterwards, you can apply a randomly selected set of operations from above, that will result in a non singular matrix. You can even apply a predefined set of operations, but using a randomly selected constant at each step.
An alternative could be to start from an upper triangular matrix for which the product of main diagonal entries is not zero. This is because the determinant of a triangular matrix is the product of the elements on the main diagonal. This effectively boils down to generating N random numbers, placing them on the main diagonal, and setting the rest of the entries (above the main diagonal) to whatever you like. If you want the matrix to be fully dense, add the first row to every other row of the matrix.
Of course this approach (like any other probably would) assumes that the matrix is relatively numerically stable and the singularity will not be affected by precision errors (as you know the precision of data types in all programming languages is limited). You would do well to avoid very small / very large values which can make the method numerically unstable.
It should be fairly unlikely that this will produce a singular matrix:
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
crossprod(Mat1,Mat2)
[,1] [,2] [,3] [,4]
[1,] 0.8138 5.112 2.945 -5.003
[2,] 4.9755 -2.420 1.801 -4.188
[3,] -3.8579 8.791 -2.594 3.340
[4,] 7.2057 6.426 2.663 -1.235
solve( crossprod(Mat1,Mat2) )
[,1] [,2] [,3] [,4]
[1,] -0.11273 0.15811 0.05616 0.07241
[2,] 0.03387 0.01187 0.07626 0.02881
[3,] 0.19007 -0.60377 -0.40665 0.17771
[4,] -0.07174 -0.31751 -0.15228 0.14582
inv1000 <- replicate(1000, {
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
try(solve( crossprod(Mat1,Mat2)))} )
str(inv1000)
#num [1:4, 1:4, 1:1000] 0.1163 0.0328 0.3424 -0.227 0.0347 ...
max(inv1000)
#[1] 451.6
> inv100000 <- replicate(100000, {Mat1 <- matrix(rnorm(100), ncol=4)
+ Mat2 <- matrix(rnorm(100), ncol=4)
+ is.singular.matrix( crossprod(Mat1,Mat2))} )
> sum(inv100000)
[1] 0
I am currently using python and RPY to use the functionality inside R.
How do I use R library to generate Monte carlo samples that honor the correlation between 2 variables..
e.g
if variable A and B have a correlation of 85% (0.85), i need to generate all the monte carlo samples honoring that correlation between A & B.
Would appreciate if anyone can share ideas / snippets
Thanks
The rank correlation method of Iman and Conover seems to be a widely used and general approach to producing correlated monte carlo samples for computer based experiments, sensitivity analysis etc. Unfortunately I have only just come across this and don't have access to the PDF so don't know how the authors actually implement their method, but you could follow this up.
Their method is more general because each variable can come from a different distribution unlike the multivariate normal of #Dirk's answer.
Update: I found an R implementation of the above approach in package mc2d, in particular you want the cornode() function.
Here is an example taken from ?cornode
> require(mc2d)
> x1 <- rnorm(1000)
> x2 <- rnorm(1000)
> x3 <- rnorm(1000)
> mat <- cbind(x1, x2, x3)
> ## Target
> (corr <- matrix(c(1, 0.5, 0.2, 0.5, 1, 0.2, 0.2, 0.2, 1), ncol=3))
[,1] [,2] [,3]
[1,] 1.0 0.5 0.2
[2,] 0.5 1.0 0.2
[3,] 0.2 0.2 1.0
> ## Before
> cor(mat, method="spearman")
x1 x2 x3
x1 1.00000000 0.01218894 -0.02203357
x2 0.01218894 1.00000000 0.02298695
x3 -0.02203357 0.02298695 1.00000000
> matc <- cornode(mat, target=corr, result=TRUE)
Spearman Rank Correlation Post Function
x1 x2 x3
x1 1.0000000 0.4515535 0.1739153
x2 0.4515535 1.0000000 0.1646381
x3 0.1739153 0.1646381 1.0000000
The rank correlations in matc are now very close to the target correlations of corr.
The idea with this is that you draw the samples separately from the distribution for each variable, and then use the Iman & Connover approach to make the samples (as close) to the target correlations as possible.
That is a FAQ. Here is one answer using a recommended package:
R> library(MASS)
R> example(mvrnorm)
mvrnrmR> Sigma <- matrix(c(10,3,3,2),2,2)
mvrnrmR> Sigma
[,1] [,2]
[1,] 10 3
[2,] 3 2
mvrnrmR> var(mvrnorm(n=1000, rep(0, 2), Sigma))
[,1] [,2]
[1,] 8.82287 2.63987
[2,] 2.63987 1.93637
mvrnrmR> var(mvrnorm(n=1000, rep(0, 2), Sigma, empirical = TRUE))
[,1] [,2]
[1,] 10 3
[2,] 3 2
R>
Switching between correlation and covariance is straightforward (hint: outer product of vector of standard deviations).
This question was not tagged as python, but based on your comment it looks like you might be looking for a Python solution as well. The most basic Python implementation of Iman Convover, that I can concoct looks like the following in Python (actually numpy):
def makeCorrelated( y, corMatrix ):
c = multivariate_normal(zeros(size( y, 0 ) ) , corMatrix, size( y, 1 ) )
key = argsort( argsort(c, axis=0), axis=0 ).T
out = map(take, map(sort, y), key)
out = array(out)
return out
where y is an array of samples from the marginal distributions and corMatrix is a positive semi definite, symmetric correlation matrix. Given that this function uses multivariate_normal() for the c matrix, you can tell this uses an implied Gaussian Copula. To use different copula structures you'll need to use different drivers for the c matrix.