I have the following correlation matrix, cor.mat below.
I want to multiply all numbers by 15% except the cell [1,1] [2,2] [3,3] [4,4].
Does anyone have a good code to implent this in R?
1 2 3 4
1 1.0000000 0.1938155 0.1738809 0.2465276
2 0.1938155 1.0000000 0.4045694 0.2729958
3 0.1738809 0.4045694 1.0000000 0.3340883
4 0.2465276 0.2729958 0.3340883 1.0000000
You can use diag which returns the diagonal of a matrix
matrix = matrix*0.15
diag(matrix) = 1
Create a logical matrix excluding the diag and do the multiplication
i1 <- row(m1) != col(m1)
m1[i1] <- m1[i1] * 0.15
Related
My aim is to eliminate duplicates from a dataset.
For that I wrote a program that calculates correlations.
I want to take the name of the variables that have a correlation higher than a specific value I determine.
Here's one of the results I got.
M926T709 M927T709_1 M927T709_2 M929T709
M926T709 1.0000000 0.9947082 0.9879702 0.8716944
M927T709_1 0.9947082 1.0000000 0.9955145 0.8785669
M927T709_2 0.9879702 0.9955145 1.0000000 0.8621052
M929T709 0.8716944 0.8785669 0.8621052 1.0000000
Let's say i want to obtain the name of variables that have percentage high than 95%
so i should obtain this result
M926T709 , M927T709_1 , M927T709_2
Edit : the answer given by Ronak Shah worked well , but i need to obtain the results as vector so i can use the names after
Note, that I shouldn't analyze orthogonal results because they always equal to 1.
Please tell me if you need any clarification, also tell me if you want to see my entire program.
Using rowSums and colSums you can count how many values are more than 0.95 in each row and column respectively and then return the names.
tmp <- mat > 0.95
diag(tmp) <- FALSE
names(Filter(function(x) x > 0, rowSums(tmp) > 0 | colSums(tmp) > 0))
#[1] "M926T709" "M927T709_1" "M927T709_2"
Sample data: the limit and the correlation matrix m (with an added negative correlation for demonstration purposes):
limit <- 0.95
m <- as.matrix( read.table(text = "
M926T709 M927T709_1 M927T709_2 M929T709
M926T709 1.0000000 -0.9947082 0.9879702 0.8716944
M927T709_1 -0.9947082 1.0000000 0.9955145 0.8785669
M927T709_2 0.9879702 0.9955145 1.0000000 0.8621052
M929T709 0.8716944 0.8785669 0.8621052 1.0000000"))
Create a subset of the desired matrix and extract the row/column names.
Target <- unique( # Remove any duplicates
unlist( # merge subvectors of the `dimnames` list into one
dimnames( # gives all names of rows and columns of the matrix below
# Create a subset of the matrix that ignores correlations < limit
m[rowSums(abs(m) * upper.tri(m) > limit) > 0, # Rows
colSums(abs(m) * upper.tri(m) > limit) > 0] # Columns
),
recursive = FALSE))
Target
#> [1] "M926T709" "M927T709_1" "M927T709_2"
Created on 2021-10-25 by the reprex package (v2.0.1)
I have a sample dataframe:
data<-data.frame(a=c(1,2,3),b=c(4,5,5),c=c(6,8,7),d=c(8,9,10))
And wish to calculate the z-scores for every row in the data frame and did :
scores<-apply(data,1,zscore)
I used the zscore function from
install.packages(c("R.basic"), contriburl="http://www.braju.com/R/repos/")
And obtained this
row.names V1 V2 V3
a -1.2558275 -1.2649111 -1.0883839
b -0.2511655 -0.3162278 -0.4186092
c 0.4186092 0.6324555 0.2511655
d 1.0883839 0.9486833 1.2558275
But when I try manually calculating the z score for the first row of the data frame I obtain the following values:
-1.45 -0.29 0.4844, 1.25
Manually, for the first row, I calculated as follows:
1) calculate the row mean (4.75) for first row
2) Subtract each value from the row mean (e.g; 4.75-1., 4.75-4., 4.75-6., 4.75-8)
3) square each difference.
4) add them up and divide by the amount of samples in row 1
5) thus I obtain the variance( answer = 6.685) and then get the standard deviation ( 2.58) of the first row alone
6) Then apply the formula of z score.
The zscore function, whatever it is, seems to be the same as scale in the base package.
apply(data, 1, scale)
## [,1] [,2] [,3]
## [1,] -1.2558275 -1.2649111 -1.0883839
## [2,] -0.2511655 -0.3162278 -0.4186092
## [3,] 0.4186092 0.6324555 0.2511655
## [4,] 1.0883839 0.9486833 1.2558275
For each column, it is calculating (x - mean(x)) / sd(x).
I need to solve for a n x n (n usually <12) matrix subject to a few constraints:
1.Predetermined row and column sums are satisfied.
2.Each element in the matrix having a row number greater than column number must be zero (so basically the only nonzero elements must be in the top right portion).
3.For a given row, every element more than three columns to the right first nonzero element must also be zero.
So, a 4x4 matrix might look something like this (the row and column constraints will be much larger in practice, usually around 1-3 million):
|3 2 1 0| = 6
|0 2 1 1| = 4
|0 0 2 1| = 3
|0 0 0 4| = 4
3 4 4 6
I have been trying to use some solver approaches to do this in excel and also have tried some R based optimization packages but have been so unsuccessful so far.
Any suggestions on how else I might approach this would be much appreciated.
Thanks!
Test data:
x <- c(2,2,2,1,1,1,1)
rowVals <- c(6,4,3,4)
colVals <- c(3,4,4,6)
Function to construct the appropriate test matrix from (3N-5) parameters:
makeMat <- function(x,n) {
## first and last element of diag are constrained by row/col sums
diagVals <- c(colVals[1],x[1:(n-2)],rowVals[n])
## set up off-diagonals 2,3
sup2Vals <- x[(n-1):(2*n-3)]
sup3Vals <- x[(2*n-2):(3*n-5)]
## set up matrix
m <- diag(diagVals)
m[row(m)==col(m)-1] <- sup2Vals
m[row(m)==col(m)-2] <- sup3Vals
m
}
Objective function (sum of squares of row & column deviations):
objFun <- function(x,n) {
m <- makeMat(x,n)
## compute SSQ deviation from row/col constraints
sum((rowVals-rowSums(m))^2+(colVals-colSums(m))^2)
}
Optimizing:
opt1 <- optim(fn=objFun,par=x,n=4)
## recovers original values, although it takes a lot of steps
opt2 <- optim(fn=objFun,par=rep(0,length(x)),n=4)
makeMat(opt2$par,n=4)
## [,1] [,2] [,3] [,4]
## [1,] 3 2.658991 0.3410682 0.0000000
## [2,] 0 1.341934 1.1546649 1.5038747
## [3,] 0 0.000000 2.5042858 0.4963472
## [4,] 0 0.000000 0.0000000 4.0000000
##
## conjugate gradients might be better
opt3 <- optim(fn=objFun,par=rep(0,length(x)),n=4,
method="CG")
It seems that there are multiple solutions to this problem, which
isn't surprising (since there are 2N constraints on (N-2)+(N-1)+(N-2)=
3N-5 parameters).
You didn't say whether you need integer solutions or not -- if
so you will need more specialized tools ...
Hi I tried calculating autocorrelation with lag u, u = 1...9
I expect 9x1 autocorrelation functions. However when I try to use this code it always gave me 10x1 autocorrelation function with the first term = 1. I am not sure how to proceed.
# initialize a vector to store autocovariance
maxlag <- 9
varstore <- rep(NA,maxlag)
# Calculate Variance
varstore[1] <- sd(as.vector(sample1),na.rm=T)^2
# Estimate autocovariances for all residuals
for (lag in 1:maxlag)
varstore[lag+1] <- mean(sample1[,1:(10-lag)] *
sample1[,(lag+1):10],na.rm=T)
print(round(varstore,3))
# calculate autocorrelations
corrstore <- varstore/varstore[1]
print(corrstore)
And this is what I get:
[1] 1.0000000 0.6578243 0.5670389 0.5292314 0.5090411 0.4743944 0.4841038 0.4756297
[9] 0.4275208 0.4048436
You get a vector of length 10 because of the recycling.
for lag =maxlog ( the last step of your for loop)
varstore[lag+1]
will create a new entry with NA. To see this clearly, try this for example :
v <- NA ## a vector of length 1
v[10] <- 2
v
[1] NA NA NA NA NA NA NA NA NA 2 ## you get a vector of legnth 10!!
That'said , why do you want a vector of length 9? Why not to use the acf function? Here the output of the acf function:
length(acf(1:10)$lag)
[1] 10
Suppose I have a matrix like so:
data=matrix(c(1,0,0,0,0,0,1,0,0.6583,0,0,0,1,0,0,0,0.6583,0,1,0,0,0,0,0,1),nrow=5,ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0.0000 0 0.0000 0
[2,] 0 1.0000 0 0.6583 0
[3,] 0 0.0000 1 0.0000 0
[4,] 0 0.6583 0 1.0000 0
[5,] 0 0.0000 0 0.0000 1
How do I create another matrix, say "data2", such that it has the same number of off-diagonal nonzero elements as "data" but in another location other than the one in data? The randomly simulated data will be uniform (so runif).
Here is a somewhat clumsy way to do this. It works well for small matrices but would be too slow if you're going to use this for some very high-dimensional problems.
# Current matrix:
data=matrix(c(1,0,0,0,0,0,1,0,0.6583,0,0,0,1,0,0,0,0.6583,0,1,0,0,0,0,0,1),nrow=5,ncol=5)
# Number of nonzero elements in upper triangle:
no.nonzero<-sum(upper.tri(data)*data>0)
# Generate same number of new nonzero correlations:
new.cor<-runif(no.nonzero,-1,1)
# Create new diagonal matrix:
p<-dim(data)[1]
data2<-diag(1,p,p)
### Insert nonzero correlations: ###
# Step 1. Identify the places where the nonzero elements can be placed:
pairs<-(p^2-p)/2 # Number of element in upper triangle
combinations<-matrix(NA,pairs,2) # Matrix containing indices for those elements (i.e. (1,2), (1,3), ... (2,3), ... and so on)
k<-0
for(i in 1:(p-1))
{
for(j in {i+1}:p)
{
k<-k+1
combinations[k,]<-c(i,j)
}
}
# Step 2. Randomly pick indices:
places<-sample(1:k,no.nonzero)
# Step 3. Insert nonzero correlations:
for(i in 1:no.nonzero)
{
data2[combinations[places[i],1],combinations[places[i],2]]<-data2[combinations[places[i],2],combinations[places[i],1]]<-new.cor[i]
}
Not really understood the question. There are two off-diagonal and non-zero elements (0.6583) in the example, right? Is matrix with two elements the result you want in this case?
data=matrix(c(1,0,0,0,0,0,1,0,0.6583,0,0,0,1,0,0,0,0.6583,0,1,0,0,0,0,0,1),nrow=5,ncol=5)
# Convert to vector
data2 <- as.numeric(data)
# Remove diagonal
data2 <- data2[as.logical(upper.tri(data) | lower.tri(data))]
# Remove 0 elements
data2 <- data2[data2 != 0]
data2